ARM document

已有 460559个资源

下载专区


TI最新应用解决方案

工业电子 汽车电子 个人消费电子

上传者其他资源

文档信息举报收藏

标    签: ARMdocument

分    享:

文档简介

ARM v7 Reference manual

文档预览

ARM® Architecture Reference Manual ARM®v7-A and ARM®v7-R edition Errata markup Copyright © 1996-1998, 2000, 2004-2011 ARM. All rights reserved. ARM DDI 0406B_errata_2011_Q3 (ID120611) ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition Errata markup Copyright © 1996-1998, 2000, 2004-2011 ARM Limited. All rights reserved. Release Information The following changes have been made to this document. Change History Date Issue Confidentiality Change 05 April 2007 A 29 April 2008 B November 2008 B March 2009 B July 2009 B October 2009 B February 2010 B July 2010 B October 2010 B June 2011 B December 2011 B Non-Confidential New edition for ARMv7-A and ARMv7-R architecture profiles. Document number changed from ARM DDI 0100 to ARM DDI 0406, contents restructured. Non-Confidential Addition of the VFP Half-precision and Multiprocessing Extensions, and many clarifications and enhancements. Non-Confidential PDF with errata issued, errata identified as ARM_2008_Q4. Non-Confidential PDF reissued with additional errata. Additional errata identified as ARM_2009_Q1. Non-Confidential PDF reissued with additional errata. Additional errata identified as ARM_2009_Q2. Non-Confidential PDF reissued with additional errata. Additional errata identified as ARM_2009_Q3. Non-Confidential PDF reissued with additional errata. Additional errata identified as ARM_2009_Q4. Non-Confidential PDF reissued with additional errata. Additional errata identified as ARM_2010_Q2. Non-Confidential PDF reissued with additional errata. Additional errata identified as ARM_2010_Q3. Non-Confidential PDF reissued with additional errata. Additional errata identified as ARM_2011_Q2. Non-Confidential PDF reissued with additional errata. Additional errata identified as ARM_2011_Q3. From ARMv7, the ARM architecture defines different architectural profiles and this edition of this manual describes only the A and R profiles. For details of the documentation of the ARMv7-M profile see Additional reading on page xxiii. Before ARMv7 there was only a single ARM Architecture Reference Manual, with document number DDI 0100. The first issue of this was in February 1996, and the final issue, Issue I, was in July 2005. For more information see Additional reading on page xxiii. Note ARM published Issue C of this document, ARM DDI 0406C, in November 2011. That issue incorporates all of the corrections and enhancements shown in the ARM_2011_Q3 errata mark-up, and can be downloaded from the Infocenter, http://infocenter.arm.com. Therefore, ARM will not publish any further errata PDFs for Issue B of the document. ii Copyright © 1996-1998, 2000, 2004-2011 ARM. All rights reserved. ARM DDI 0406B_errata_2011_Q3 Non-Confidential ID120611 Proprietary Notice This ARM Architecture Reference Manual is protected by copyright and the practice or implementation of the information herein may be protected by one or more patents or pending applications. No part of this ARM Architecture Reference Manual may be reproduced in any form by any means without the express prior written permission of ARM. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this ARM Architecture Reference Manual. Your access to the information in this ARM Architecture Reference Manual is conditional upon your acceptance that you will not use or permit others to use the information for the purposes of determining whether implementations of the ARM architecture infringe any third party patents. This ARM Architecture Reference Manual is provided “as is”. ARM makes no representations or warranties, either express or implied, included but not limited to, warranties of merchantability, fitness for a particular purpose, or non-infringement, that the content of this ARM Architecture Reference Manual is suitable for any particular purpose or that any practice or implementation of the contents of the ARM Architecture Reference Manual will not infringe any third party patents, copyrights, trade secrets, or other rights. This ARM Architecture Reference Manual may include technical inaccuracies or typographical errors. To the extent not prohibited by law, in no event will ARM be liable for any damages, including without limitation any direct loss, lost revenue, lost profits or data, special, indirect, consequential, incidental or punitive damages, however caused and regardless of the theory of liability, arising out of or related to any furnishing, practicing, modifying or any use of this ARM Architecture Reference Manual, even if ARM has been advised of the possibility of such damages. Words and logos marked with ® or TM are registered trademarks or trademarks of ARM Limited, except as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may be the trademarks of their respective owners. Copyright © 1996-1998, 2000, 2004-2011 ARM Limited 110 Fulbourn Road Cambridge, England CB1 9NJ Restricted Rights Legend: Use, duplication or disclosure by the United States Government is subject to the restrictions set forth in DFARS 252.227-7013 (c)(1)(ii) and FAR 52.227-19. This document is Non-Confidential but any disclosure by you is subject to you providing notice to and the acceptance by the recipient of, the conditions set out above. In this document, where the term ARM is used to refer to the company it means “ARM or any of its subsidiaries as appropriate”. Note The term ARM is also used to refer to versions of the ARM architecture, for example ARMv6 refers to version 6 of the ARM architecture. The context makes it clear when the term is used in this way. Note For this errata PDF, pages i to iv have been replaced, by an edit to the PDF, to include an updated Proprietary Notice, and to include the errata PDFs in the Change History table. The remainder of the PDF is the original release PDF of issue B of the document, with errata markups added. ARM DDI 0406B_errata_2011_Q3 Copyright © 1996-1998, 2000, 2004-2011 ARM. All rights reserved. iii ID120611 Non-Confidential iv Copyright © 1996-1998, 2000, 2004-2011 ARM. All rights reserved. ARM DDI 0406B_errata_2011_Q3 Non-Confidential ID120611 Contents ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition Part A Chapter A1 Chapter A2 Preface About this manual ............................................................................... xiv Using this manual ................................................................................ xv Conventions ....................................................................................... xviii Further reading .................................................................................... xx Feedback ............................................................................................ xxi Application Level Architecture Introduction to the ARM Architecture A1.1 About the ARM architecture ............................................................. A1-2 A1.2 The ARM and Thumb instruction sets .............................................. A1-3 A1.3 Architecture versions, profiles, and variants .................................... A1-4 A1.4 Architecture extensions .................................................................... A1-6 A1.5 The ARM memory model ................................................................. A1-7 A1.6 Debug .............................................................................................. A1-8 Application Level Programmers’ Model A2.1 About the Application level programmers’ model ............................. A2-2 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. v Contents Chapter A3 Chapter A4 Chapter A5 A2.2 A2.3 A2.4 A2.5 A2.6 A2.7 A2.8 A2.9 A2.10 A2.11 ARM core data types and arithmetic ................................................ A2-3 ARM core registers ........................................................................ A2-11 The Application Program Status Register (APSR) ......................... A2-14 Execution state registers ................................................................ A2-15 Advanced SIMD and VFP extensions ............................................ A2-20 Floating-point data types and arithmetic ........................................ A2-32 Polynomial arithmetic over {0,1} .................................................... A2-67 Coprocessor support ...................................................................... A2-68 Execution environment support ..................................................... A2-69 Exceptions, debug events and checks ........................................... A2-81 Application Level Memory Model A3.1 Address space ................................................................................. A3-2 A3.2 Alignment support ............................................................................ A3-4 A3.3 Endian support ................................................................................. A3-7 A3.4 Synchronization and semaphores .................................................. A3-12 A3.5 Memory types and attributes and the memory order model .......... A3-24 A3.6 Access rights .................................................................................. A3-38 A3.7 Virtual and physical addressing ..................................................... A3-40 A3.8 Memory access order .................................................................... A3-41 A3.9 Caches and memory hierarchy ...................................................... A3-51 The Instruction Sets A4.1 About the instruction sets ................................................................. A4-2 A4.2 Unified Assembler Language ........................................................... A4-4 A4.3 Branch instructions .......................................................................... A4-7 A4.4 Data-processing instructions ............................................................ A4-8 A4.5 Status register access instructions ................................................ A4-18 A4.6 Load/store instructions ................................................................... A4-19 A4.7 Load/store multiple instructions ..................................................... A4-22 A4.8 Miscellaneous instructions ............................................................. A4-23 A4.9 Exception-generating and exception-handling instructions ............ A4-24 A4.10 Coprocessor instructions ............................................................... A4-25 A4.11 Advanced SIMD and VFP load/store instructions .......................... A4-26 A4.12 Advanced SIMD and VFP register transfer instructions ................. A4-29 A4.13 Advanced SIMD data-processing operations ................................. A4-30 A4.14 VFP data-processing instructions .................................................. A4-38 ARM Instruction Set Encoding A5.1 ARM instruction set encoding .......................................................... A5-2 A5.2 Data-processing and miscellaneous instructions ............................. A5-4 A5.3 Load/store word and unsigned byte ............................................... A5-19 A5.4 Media instructions .......................................................................... A5-21 A5.5 Branch, branch with link, and block data transfer .......................... A5-27 A5.6 Supervisor Call, and coprocessor instructions ............................... A5-28 A5.7 Unconditional instructions .............................................................. A5-30 vi Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Contents Chapter A6 Chapter A7 Chapter A8 Chapter A9 Part B Chapter B1 Thumb Instruction Set Encoding A6.1 Thumb instruction set encoding ....................................................... A6-2 A6.2 16-bit Thumb instruction encoding ................................................... A6-6 A6.3 32-bit Thumb instruction encoding ................................................. A6-14 Advanced SIMD and VFP Instruction Encoding A7.1 Overview .......................................................................................... A7-2 A7.2 Advanced SIMD and VFP instruction syntax ................................... A7-3 A7.3 Register encoding ............................................................................ A7-8 A7.4 Advanced SIMD data-processing instructions ............................... A7-10 A7.5 VFP data-processing instructions .................................................. A7-24 A7.6 Extension register load/store instructions ...................................... A7-26 A7.7 Advanced SIMD element or structure load/store instructions ........ A7-27 A7.8 8, 16, and 32-bit transfer between ARM core and extension registers ..... A7-31 A7.9 64-bit transfers between ARM core and extension registers ......... A7-32 Instruction Details A8.1 Format of instruction descriptions .................................................... A8-2 A8.2 Standard assembler syntax fields .................................................... A8-7 A8.3 Conditional execution ....................................................................... A8-8 A8.4 Shifts applied to a register ............................................................. A8-10 A8.5 Memory accesses .......................................................................... A8-13 A8.6 Alphabetical list of instructions ....................................................... A8-14 ThumbEE A9.1 The ThumbEE instruction set ........................................................... A9-2 A9.2 ThumbEE instruction set encoding .................................................. A9-6 A9.3 Additional instructions in Thumb and ThumbEE instruction sets ..... A9-7 A9.4 ThumbEE instructions with modified behavior ................................. A9-8 A9.5 Additional ThumbEE instructions ................................................... A9-14 System Level Architecture The System Level Programmers’ Model B1.1 About the system level programmers’ model ................................... B1-2 B1.2 System level concepts and terminology ........................................... B1-3 B1.3 ARM processor modes and core registers ....................................... B1-6 B1.4 Instruction set states ...................................................................... B1-23 B1.5 The Security Extensions ................................................................ B1-25 B1.6 Exceptions ..................................................................................... B1-30 B1.7 Coprocessors and system control .................................................. B1-62 B1.8 Advanced SIMD and floating-point support .................................... B1-64 B1.9 Execution environment support ..................................................... B1-73 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. vii Contents Chapter B2 Chapter B3 Chapter B4 Chapter B5 Chapter B6 Part C Chapter C1 Common Memory System Architecture Features B2.1 About the memory system architecture ........................................... B2-2 B2.2 Caches ............................................................................................. B2-3 B2.3 Implementation defined memory system features ......................... B2-27 B2.4 Pseudocode details of general memory system operations .......... B2-29 Virtual Memory System Architecture (VMSA) B3.1 About the VMSA .............................................................................. B3-2 B3.2 Memory access sequence ............................................................... B3-4 B3.3 Translation tables ............................................................................. B3-7 B3.4 Address mapping restrictions ......................................................... B3-23 B3.5 Secure and Non-secure address spaces ....................................... B3-26 B3.6 Memory access control .................................................................. B3-28 B3.7 Memory region attributes ............................................................... B3-32 B3.8 VMSA memory aborts .................................................................... B3-40 B3.9 Fault Status and Fault Address registers in a VMSA implementation ...... B3-48 B3.10 Translation Lookaside Buffers (TLBs) ............................................ B3-54 B3.11 Virtual Address to Physical Address translation operations ........... B3-63 B3.12 CP15 registers for a VMSA implementation .................................. B3-64 B3.13 Pseudocode details of VMSA memory system operations .......... B3-156 Protected Memory System Architecture (PMSA) B4.1 About the PMSA .............................................................................. B4-2 B4.2 Memory access control .................................................................... B4-9 B4.3 Memory region attributes ............................................................... B4-11 B4.4 PMSA memory aborts .................................................................... B4-13 B4.5 Fault Status and Fault Address registers in a PMSA implementation ...... B4-18 B4.6 CP15 registers for a PMSA implementation .................................. B4-22 B4.7 Pseudocode details of PMSA memory system operations ............ B4-79 The CPUID Identification Scheme B5.1 Introduction to the CPUID scheme .................................................. B5-2 B5.2 The CPUID registers ........................................................................ B5-4 B5.3 Advanced SIMD and VFP feature identification registers .............. B5-34 System Instructions B6.1 Alphabetical list of instructions ......................................................... B6-2 Debug Architecture Introduction to the ARM Debug Architecture C1.1 Scope of part C of this manual ......................................................... C1-2 C1.2 About the ARM Debug architecture ................................................. C1-3 viii Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Contents Chapter C2 Chapter C3 Chapter C4 Chapter C5 Chapter C6 Chapter C7 C1.3 Security Extensions and debug ....................................................... C1-8 C1.4 Register interfaces ........................................................................... C1-9 Invasive Debug Authentication C2.1 About invasive debug authentication ............................................... C2-2 Debug Events C3.1 About debug events ......................................................................... C3-2 C3.2 Software debug events .................................................................... C3-5 C3.3 Halting debug events ..................................................................... C3-38 C3.4 Generation of debug events ........................................................... C3-40 C3.5 Debug event prioritization .............................................................. C3-43 Debug Exceptions C4.1 About debug exceptions .................................................................. C4-2 C4.2 Effects of debug exceptions on CP15 registers and the DBGWFAR ........ C4-4 Debug State C5.1 About Debug state ........................................................................... C5-2 C5.2 Entering Debug state ....................................................................... C5-3 C5.3 Behavior of the PC and CPSR in Debug state ................................. C5-7 C5.4 Executing instructions in Debug state .............................................. C5-9 C5.5 Privilege in Debug state ................................................................. C5-13 C5.6 Behavior of non-invasive debug in Debug state ............................. C5-19 C5.7 Exceptions in Debug state ............................................................. C5-20 C5.8 Memory system behavior in Debug state ....................................... C5-24 C5.9 Leaving Debug state ...................................................................... C5-28 Debug Register Interfaces C6.1 About the debug register interfaces ................................................. C6-2 C6.2 Reset and power-down support ....................................................... C6-4 C6.3 Debug register map ....................................................................... C6-18 C6.4 Synchronization of debug register updates .................................... C6-24 C6.5 Access permissions ....................................................................... C6-26 C6.6 The CP14 debug register interfaces .............................................. C6-32 C6.7 The memory-mapped and recommended external debug interfaces ....... C6-43 Non-invasive Debug Authentication C7.1 About non-invasive debug authentication ........................................ C7-2 C7.2 v7 Debug non-invasive debug authentication .................................. C7-4 C7.3 Effects of non-invasive debug authentication .................................. C7-6 C7.4 ARMv6 non-invasive debug authentication ...................................... C7-8 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ix Contents Chapter C8 Chapter C9 Chapter C10 Appendix A Appendix B Appendix C Sample-based Profiling C8.1 Program Counter sampling .............................................................. C8-2 Performance Monitors C9.1 About the performance monitors ...................................................... C9-2 C9.2 Status in the ARM architecture ........................................................ C9-4 C9.3 Accuracy of the performance monitors ............................................ C9-5 C9.4 Behavior on overflow ....................................................................... C9-6 C9.5 Interaction with Security Extensions ................................................ C9-7 C9.6 Interaction with trace ........................................................................ C9-8 C9.7 Interaction with power saving operations ......................................... C9-9 C9.8 CP15 c9 register map .................................................................... C9-10 C9.9 Access permissions ....................................................................... C9-12 C9.10 Event numbers ............................................................................... C9-13 Debug Registers Reference C10.1 Accessing the debug registers ....................................................... C10-2 C10.2 Debug identification registers ......................................................... C10-3 C10.3 Control and status registers ......................................................... C10-10 C10.4 Instruction and data transfer registers ......................................... C10-40 C10.5 Software debug event registers ................................................... C10-48 C10.6 OS Save and Restore registers, v7 Debug only .......................... C10-75 C10.7 Memory system control registers ................................................. C10-80 C10.8 Management registers, ARMv7 only ............................................ C10-88 C10.9 Performance monitor registers ................................................... C10-105 Recommended External Debug Interface A.1 System integration signals ......................................................... AppxA-2 A.2 Recommended debug slave port ............................................. AppxA-13 Common VFP Subarchitecture Specification B.1 Scope of this appendix ............................................................... AppxB-2 B.2 Introduction to the Common VFP subarchitecture ..................... AppxB-3 B.3 Exception processing ................................................................. AppxB-6 B.4 Support code requirements ...................................................... AppxB-11 B.5 Context switching ..................................................................... AppxB-14 B.6 Subarchitecture additions to the VFP system registers ........... AppxB-15 B.7 Version 1 of the Common VFP subarchitecture ....................... AppxB-23 B.8 Version 2 of the Common VFP subarchitecture ....................... AppxB-24 Legacy Instruction Mnemonics C.1 Thumb instruction mnemonics ................................................... AppxC-2 C.2 Pre-UAL pseudo-instruction NOP .............................................. AppxC-3 x Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Contents Appendix D Appendix E Appendix F Appendix G Appendix H Appendix I Deprecated and Obsolete Features D.1 Deprecated features .................................................................. AppxD-2 D.2 Deprecated terminology ............................................................. AppxD-5 D.3 Obsolete features ....................................................................... AppxD-6 D.4 Semaphore instructions ............................................................. AppxD-7 D.5 Use of the SP as a general-purpose register ............................. AppxD-8 D.6 Explicit use of the PC in ARM instructions ................................. AppxD-9 D.7 Deprecated Thumb instructions ............................................... AppxD-10 Fast Context Switch Extension (FCSE) E.1 About the FCSE ......................................................................... AppxE-2 E.2 Modified virtual addresses ......................................................... AppxE-3 E.3 Debug and trace ........................................................................ AppxE-5 VFP Vector Operation Support F.1 About VFP vector mode ............................................................. AppxF-2 F.2 Vector length and stride control ................................................. AppxF-3 F.3 VFP register banks .................................................................... AppxF-5 F.4 VFP instruction type selection .................................................... AppxF-7 ARMv6 Differences G.1 Introduction to ARMv6 .............................................................. AppxG-2 G.2 Application level register support .............................................. AppxG-3 G.3 Application level memory support ............................................. AppxG-6 G.4 Instruction set support ............................................................. AppxG-10 G.5 System level register support .................................................. AppxG-16 G.6 System level memory model ................................................... AppxG-20 G.7 System Control coprocessor (CP15) support .......................... AppxG-29 ARMv4 and ARMv5 Differences H.1 Introduction to ARMv4 and ARMv5 ............................................ AppxH-2 H.2 Application level register support ............................................... AppxH-4 H.3 Application level memory support .............................................. AppxH-6 H.4 Instruction set support .............................................................. AppxH-11 H.5 System level register support ................................................... AppxH-18 H.6 System level memory model .................................................... AppxH-21 H.7 System Control coprocessor (CP15) support ........................... AppxH-31 Pseudocode Definition I.1 Instruction encoding diagrams and pseudocode ......................... AppxI-2 I.2 Limitations of pseudocode .......................................................... AppxI-4 I.3 Data types ................................................................................... AppxI-5 I.4 Expressions ................................................................................ AppxI-9 I.5 Operators and built-in functions ................................................ AppxI-11 I.6 Statements and program structure ............................................ AppxI-17 I.7 Miscellaneous helper procedures and functions ....................... AppxI-22 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. xi Contents Appendix J Appendix K Pseudocode Index J.1 Pseudocode operators and keywords ........................................ AppxJ-2 J.2 Pseudocode functions and procedures ...................................... AppxJ-6 Register Index K.1 Register index ............................................................................ AppxK-2 Glossary xii Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Preface This preface summarizes the contents of this manual and lists the conventions it uses. It contains the following sections: • About this manual on page xiv • Using this manual on page xv • Conventions on page xviii • Further reading on page xx • Feedback on page xxi. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. xiii Preface About this manual This manual describes the ARM®v7 instruction set architecture, including its high code density Thumb® instruction encoding and the following extensions to it: • The System Control coprocessor, coprocessor 15 (CP15), used to control memory system components such as caches, write buffers, Memory Management Units, and Protection Units. • The optional Advanced SIMD extension, that provides high-performance integer and single-precision floating-point vector operations. • The optional VFP extension, that provides high-performance floating-point operations. It can optionally support double-precision operations. • The Debug architecture, that provides software access to debug features in ARM processors. Part A describes the application level view of the architecture. It describes the application level view of the programmers’ model and the memory model. It also describes the precise effects of each instruction in User mode (the normal operating mode), including any restrictions on its use. This information is of primary importance to authors and users of compilers, assemblers, and other programs that generate ARM machine code. Part B describes the system level view of the architecture. It gives details of system registers that are not accessible from User mode, and the system level view of the memory model. It also gives full details of the effects of instructions in privileged modes (any mode other than User mode), where these are different from their effects in User mode. Part C describes the Debug architecture. This is an extension to the ARM architecture that provides configuration, breakpoint and watchpoint support, and a Debug Communications Channel (DCC) to a debug host. Assembler syntax is given for the instructions described in this manual, permitting instructions to be specified in textual form. However, this manual is not intended as tutorial material for ARM assembler language, nor does it describe ARM assembler language at anything other than a very basic level. To make effective use of ARM assembler language, consult the documentation supplied with the assembler being used. xiv Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Preface Using this manual The information in this manual is organized into four parts, as described below. Part A, Application Level Architecture Part A describes the application level view of the architecture. It contains the following chapters: Chapter A1 Gives a brief overview of the ARM architecture, and the ARM and Thumb instruction sets. Chapter A2 Describes the application level view of the ARM programmers’ model, including the application level view of the Advanced SIMD and VFP extensions. It describes the types of value that ARM instructions operate on, the general-purpose registers that contain those values, and the Application Program Status Register. Chapter A3 Describes the application level view of the memory model, including the ARM memory types and attributes, and memory access control. Chapter A4 Describes the range of instructions available in the ARM, Thumb, Advanced SIMD, and VFP instruction sets. It also contains some details of instruction operation, where these are common to several instructions. Chapter A5 Gives details of the encoding of the ARM instruction set. Chapter A6 Gives details of the encoding of the Thumb instruction set. Chapter A7 Gives details of the encoding of the Advanced SIMD and VFP instruction sets. Chapter A8 Provides detailed reference information about every instruction available in the Thumb, ARM, Advanced SIMD, and VFP instruction sets, with the exception of information only relevant in privileged modes. Chapter A9 Provides detailed reference information about the ThumbEE (Execution Environment) variant of the Thumb instruction set. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. xv Preface Part B, System Level Architecture Part B describes the system level view of the architecture. It contains the following chapters: Chapter B1 Describes the system level view of the programmers’ model. Chapter B2 Describes the system level view of the memory model features that are common to all memory systems. Chapter B3 Describes the system level view of the Virtual Memory System Architecture (VMSA) that is part of all ARMv7-A implementations. This chapter includes descriptions of all of the CP15 System Control Coprocessor registers in a VMSA implementation. Chapter B4 Describes the system level view of the Protected Memory System Architecture (PMSA) that is part of all ARMv7-R implementations. This chapter includes descriptions of all of the CP15 System Control Coprocessor registers in a PMSA implementation. Chapter B5 Describes the CPUID scheme. Chapter B6 Provides detailed reference information about system instructions, and more information about instructions where they behave differently in privileged modes. Part C, Debug Architecture Part C describes the Debug architecture. It contains the following chapters: Chapter C1 Gives a brief introduction to the Debug architecture. Chapter C2 Describes the authentication of invasive debug. Chapter C3 Describes the debug events. Chapter C4 Describes the debug exceptions. Chapter C5 Describes Debug state. Chapter C6 Describes the permitted debug register interfaces. Chapter C7 Describes the authentication of non-invasive debug. Chapter C8 Describes sample-based profiling. Chapter C9 Describes the ARM performance monitors. Chapter C10 Describes the debug registers. xvi Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Preface Part D, Appendices This manual contains the following appendices: Appendix A Describes the recommended external Debug interfaces. Note This description is not part of the ARM architecture specification. It is included here only as supplementary information, for the convenience of developers and users who might require this information. Appendix B The Common VFP subarchitecture specification. Note This specification is not part of the ARM architecture specification. This sub-architectural information is included here only as supplementary information, for the convenience of developers and users who might require this information. Appendix C Describes the legacy mnemonics. Appendix D Identifies the deprecated architectural features. Appendix E Describes the Fast Context Switch Extension (FCSE). From ARMv6, the use of this feature is deprecated, and in ARMv7 the FCSE is optional. Appendix F Describes the VFP vector operations. Use of these operations is deprecated in ARMv7. Appendix G Describes the differences in the ARMv6 architecture. Appendix H Describes the differences in the ARMv4 and ARMv5 architectures. Appendix I The formal definition of the pseudocode. Appendix J Index to definitions of pseudocode operators, keywords, functions, and procedures. Appendix K Index to register descriptions in the manual. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. xvii Preface Conventions This manual employs typographic and other conventions intended to improve its ease of use. General typographic conventions typewriter Is used for assembler syntax descriptions, pseudocode descriptions of instructions, and source code examples. In the cases of assembler syntax descriptions and pseudocode descriptions, see the additional conventions below. The typewriter style is also used in the main text for instruction mnemonics and for references to other items appearing in assembler syntax descriptions, pseudocode descriptions of instructions and source code examples. italic Highlights important notes, introduces special terminology, and denotes internal cross-references and citations. bold Is used for emphasis in descriptive lists and elsewhere, where appropriate. SMALL CAPITALS Are used for a few terms that have specific technical meanings. Their meanings can be found in the Glossary. Signals In general this specification does not define processor signals, but it does include some signal examples and recommendations. It uses the following signal conventions: Signal level The level of an asserted signal depends on whether the signal is active-HIGH or active-LOW. Asserted means: • HIGH for active-HIGH signals • LOW for active-LOW signals. Lower-case n At the start or end of a signal name denotes an active-LOW signal. Numbers Numbers are normally written in decimal. Binary numbers are preceded by 0b, and hexadecimal numbers by 0x and written in a typewriter font. Bit values Values of bits and bitfields are normally given in binary, in single quotes. The quotes are normally omitted in encoding diagrams and tables. Pseudocode descriptions This manual uses a form of pseudocode to provide precise descriptions of the specified functionality. This pseudocode is written in a typewriter font, and is described in Appendix I Pseudocode Definition. xviii Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Preface Assembler syntax descriptions This manual contains numerous syntax descriptions for assembler instructions and for components of assembler instructions. These are shown in a typewriter font, and use the conventions described in Assembler syntax on page A8-4. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. xix Preface Further reading This section lists publications from both ARM and third parties that provide more information on the ARM family of processors. ARM periodically provides updates and corrections to its documentation. See http://www.arm.com for current errata sheets and addenda, and the ARM Frequently Asked Questions. ARM publications • ARM Debug Interface v5 Architecture Specification (ARM IHI 0031) • ARMv7-M Architecture Reference Manual (ARM DDI 0403) • CoreSight Architecture Specification (ARM IHI 0029) • ARM Architecture Reference Manual (ARM DDI 0100I) Note — Issue I of the ARM Architecture Reference Manual (DDI 0100I) was issued in July 2005 and describes the first version of the ARMv6 architecture, and all previous architecture versions. — Addison-Wesley Professional publish ARM Architecture Reference Manual, Second Edition (December 27, 2000). The contents of this are identical to Issue E of the ARM Architecture Reference Manual (DDI 0100E). It describes ARMv5TE and earlier versions of the ARM architecture, and is superseded by DDI 0100I. • Embedded Trace Macrocell Architecture Specification (ARM IHI 0014) • CoreSight Program Flow Trace Architecture Specification (ARM IHI 0035). External publications The following books are referred to in this manual, or provide more information: • IEEE Std 1596.5-1993, IEEE Standard for Shared-Data Formats Optimized for Scalable Coherent Interface (SCI) Processors, ISBN 1-55937-354-7 • IEEE Std 1149.1-2001, IEEE Standard Test Access Port and Boundary Scan Architecture (JTAG) • ANSI/IEEE Std 754-1985, IEEE Standard for Binary Floating-Point Arithmetic • JEP106, Standard Manufacturers Identification Code, JEDEC Solid State Technology Association • The Java Virtual Machine Specification Second Edition, Tim Lindholm and Frank Yellin, published by Addison Wesley (ISBN: 0-201-43294-3) • Memory Consistency Models for Shared Memory-Multiprocessors, Kourosh Gharachorloo, Stanford University Technical Report CSL-TR-95-685 xx Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Feedback ARM welcomes feedback on its documentation. Feedback on this manual If you notice any errors or omissions in this manual, send e-mail to errata@arm.com giving: • the document title • the document number • the page number(s) to which your comments apply • a concise explanation of the problem. General suggestions for additions and improvements are also welcome. Preface ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. xxi Preface xxii Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Part A Application Level Architecture Chapter A1 Introduction to the ARM Architecture This chapter introduces the ARM architecture and contains the following sections: • About the ARM architecture on page A1-2 • The ARM and Thumb instruction sets on page A1-3 • Architecture versions, profiles, and variants on page A1-4 • Architecture extensions on page A1-6 • The ARM memory model on page A1-7 • Debug on page A1-8. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A1-1 Introduction to the ARM Architecture A1.1 About the ARM architecture The ARM architecture supports implementations across a wide range of performance points. It is established as the dominant architecture in many market segments. The architectural simplicity of ARM processors leads to very small implementations, and small implementations mean devices can have very low power consumption. Implementation size, performance, and very low power consumption are key attributes of the ARM architecture. The ARM architecture is a Reduced Instruction Set Computer (RISC) architecture, as it incorporates these typical RISC architecture features: • a large uniform register file • a load/store architecture, where data-processing operations only operate on register contents, not directly on memory contents • simple addressing modes, with all load/store addresses being determined from register contents and instruction fields only. In addition, the ARM architecture provides: • instructions that combine a shift with an arithmetic or logical operation • auto-increment and auto-decrement addressing modes to optimize program loops • Load and Store Multiple instructions to maximize data throughput • conditional execution of almost all instructions to maximize execution throughput. These enhancements to a basic RISC architecture enable ARM processors to achieve a good balance of high performance, small code size, low power consumption, and small silicon area. Except where the architecture specifies differently, the programmer-visible behavior of an implementation must be the same as a simple sequential execution of the program. This programmer-visible behavior does not include the execution time of the program. A1-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Introduction to the ARM Architecture A1.2 The ARM and Thumb instruction sets The ARM instruction set is a set of 32-bit instructions providing comprehensive data-processing and control functions. The Thumb instruction set was developed as a 16-bit instruction set with a subset of the functionality of the ARM instruction set. It provides significantly improved code density, at a cost of some reduction in performance. A processor executing Thumb instructions can change to executing ARM instructions for performance critical segments, in particular for handling interrupts. In ARMv6T2, Thumb-2 technology is introduced. This technology makes it possible to extend the original Thumb instruction set with many 32-bit instructions. The range of 32-bit Thumb instructions included in ARMv6T2 permits Thumb code to achieve performance similar to ARM code, with code density better than that of earlier Thumb code. From ARMv6T2, the ARM and Thumb instruction sets provide almost identical functionality. For more information, see Chapter A4 The Instruction Sets. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A1-3 Introduction to the ARM Architecture A1.3 Architecture versions, profiles, and variants The ARM and Thumb instruction set architectures have evolved significantly since they were first developed. They will continue to be developed in the future. Seven major versions of the instruction set have been defined to date, denoted by the version numbers 1 to 7. Of these, the first three versions are now obsolete. ARMv7 provides three profiles: ARMv7-A Application profile, described in this manual. Implements a traditional ARM architecture with multiple modes and supporting a Virtual Memory System Architecture (VMSA) based on an MMU. Supports the ARM and Thumb instruction sets. ARMv7-R Real-time profile, described in this manual. Implements a traditional ARM architecture with multiple modes and supporting a Protected Memory System Architecture (PMSA) based on an MPU. Supports the ARM and Thumb instruction sets. ARMv7-M Microcontroller profile, described in the ARMv7-M Architecture Reference Manual. Implements a programmers' model designed for fast interrupt processing, with hardware stacking of registers and support for writing interrupt handlers in high-level languages. Implements a variant of the ARMv7 PMSA and supports a variant of the Thumb instruction set. Versions can be qualified with variant letters to specify additional instructions and other functionality that are included as an architecture extension. Extensions are typically included in the base architecture of the next version number. Provision is also made to exclude variants by prefixing the variant letter with x. Some extensions are described separately instead of using a variant letter. For details of these extensions see Architecture extensions on page A1-6. The valid variants of ARMv4, ARMv5, and ARMv6 are as follows: ARMv4 The earliest architecture variant covered by this manual. It includes only the ARM instruction set. ARMv4T Adds the Thumb instruction set. ARMv5T Improves interworking of ARM and Thumb instructions. Adds count leading zeros (CLZ) and software breakpoint (BKPT) instructions. ARMv5TE Enhances arithmetic support for digital signal processing (DSP) algorithms. Adds preload data (PLD), dual word load (LDRD), store (STRD), and 64-bit coprocessor register transfers (MCRR, MRRC). ARMv5TEJ Adds the BXJ instruction and other support for the Jazelle® architecture extension. ARMv6 Adds many new instructions to the ARM instruction set. Formalizes and revises the memory model and the Debug architecture. ARMv6K Adds instructions to support multi-processing to the ARM instruction set, and some extra memory model features. A1-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Introduction to the ARM Architecture ARMv6T2 Introduces Thumb-2 technology, giving a major development of the Thumb instruction set to provide a similar level of functionality to the ARM instruction set. Note ARMv6KZ or ARMv6Z are sometimes used to describe the ARMv6K architecture with the optional Security Extensions. For detailed information about versions of the ARM architecture, see Appendix G ARMv6 Differences and Appendix H ARMv4 and ARMv5 Differences. The following architecture variants are now obsolete: ARMv1, ARMv2, ARMv2a, ARMv3, ARMv3G, ARMv3M, ARMv4xM, ARMv4TxM, ARMv5, ARMv5xM, ARMv5TxM, and ARMv5TExP. Contact ARM if you require details of obsolete variants. Instruction descriptions in this manual specify the architecture versions that support them. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A1-5 Introduction to the ARM Architecture A1.4 Architecture extensions This manual describes the following extensions to the ARM and Thumb instruction set architectures: ThumbEE Is a variant of the Thumb instruction set that is designed as a target for dynamically generated code. It is: • a required extension to the ARMv7-A profile • an optional extension to the ARMv7-R profile. VFP Is a floating-point coprocessor extension to the instruction set architectures. There have been three main versions of VFP to date: • VFPv1 is obsolete. Details are available on request from ARM. • VFPv2 is an optional extension to: — the ARM instruction set in the ARMv5TE, ARMv5TEJ, ARMv6, and ARMv6K architectures — the ARM and Thumb instruction sets in the ARMv6T2 architecture. • VFPv3 is an optional extension to the ARM, Thumb and ThumbEE instruction sets in the ARMv7-A and ARMv7-R profiles. VFPv3 can be implemented with either thirty-two or sixteen doubleword registers, as described in Advanced SIMD and VFP extension registers on page A2-21. Where necessary, the terms VFPv3-D32 and VFPv3-D16 are used to distinguish between these two implementation options. Where the term VFPv3 is used it covers both options. VFPv3 can be extended by the half-precision extensions that provide conversion functions in both directions between half-precision floating-point and single-precision floating-point. Advanced SIMD Is an instruction set extension that provides Single Instruction Multiple Data (SIMD) functionality. It is an optional extension to the ARMv7-A and ARMv7-R profiles. When VFPv3 and Advanced SIMD are both implemented, they use a shared register bank and have some shared instructions. Advanced SIMD can be extended by the half-precision extensions that provide conversion functions in both directions between half-precision floating-point and single-precision floating-point. Security Extensions Are a set of security features that facilitate the development of secure applications. They are an optional extension to the ARMv6K architecture and the ARMv7-A profile. Jazelle Is the Java bytecode execution extension that extended ARMv5TE to ARMv5TEJ. From ARMv6 Jazelle is a required part of the architecture, but is still often described as the Jazelle extension. Multiprocessing Extensions Are a set of features that enhance multiprocessing functionality. They are an optional extension to the ARMv7-A and ARMv7-R profiles. A1-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Introduction to the ARM Architecture A1.5 The ARM memory model The ARM architecture uses a single, flat address space of 232 8-bit bytes. The address space is also regarded as 230 32-bit words or 231 16-bit halfwords. The architecture provides facilities for: • faulting unaligned memory accesses • restricting access by applications to specified areas of memory • translating virtual addresses provided by executing instructions into physical addresses • altering the interpretation of word and halfword data between big-endian and little-endian • optionally preventing out-of-order access to memory • controlling caches • synchronizing access to shared memory by multiple processors. For more information, see: • Chapter A3 Application Level Memory Model • Chapter B2 Common Memory System Architecture Features • Chapter B3 Virtual Memory System Architecture (VMSA) • Chapter B4 Protected Memory System Architecture (PMSA). ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A1-7 Introduction to the ARM Architecture A1.6 Debug ARMv7 processors implement two types of debug support: Invasive debug Debug permitting modification of the state of the processor. This is intended primarily for run-control debugging. Non-invasive debug Debug permitting data and program flow observation, without modifying the state of the processor or interrupting the flow of execution. This provides for: • instruction and data tracing • program counter sampling • performance monitors. For more information, see Chapter C1 Introduction to the ARM Debug Architecture. A1-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter A2 Application Level Programmers’ Model This chapter gives an application level view of the ARM programmers’ model. It contains the following sections: • About the Application level programmers’ model on page A2-2 • ARM core data types and arithmetic on page A2-3 • ARM core registers on page A2-11 • The Application Program Status Register (APSR) on page A2-14 • Execution state registers on page A2-15 • Advanced SIMD and VFP extensions on page A2-20 • Floating-point data types and arithmetic on page A2-32 • Polynomial arithmetic over {0,1} on page A2-67 • Coprocessor support on page A2-68 • Execution environment support on page A2-69 • Exceptions, debug events and checks on page A2-81. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-1 Application Level Programmers’ Model A2.1 About the Application level programmers’ model This chapter contains the programmers’ model information required for application development. The information in this chapter is distinct from the system information required to service and support application execution under an operating system. However, some knowledge of that system information is needed to put the Application level programmers' model into context. System level support requires access to all features and facilities of the architecture, a mode of operation referred to as privileged operation. System code determines whether an application runs in a privileged or unprivileged manner. When an operating system supports both privileged and unprivileged operation, an application usually runs unprivileged. This: • permits the operating system to allocate system resources to it in a unique or shared manner • provides a degree of protection from other processes and tasks, and so helps protect the operating system from malfunctioning applications. This chapter indicates where some system level understanding is helpful, and where appropriate it: • gives an overview of the system level information • gives references to the system level descriptions in Chapter B1 The System Level Programmers’ Model and elsewhere. The Security Extensions extend the architecture to provide hardware security features that support the development of secure applications. For more information, see The Security Extensions on page B1-25. A2-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model A2.2 ARM core data types and arithmetic All ARMv7-A and ARMv7-R processors support the following data types in memory: Byte 8 bits Halfword 16 bits Word 32 bits Doubleword 64 bits. Processor registers are 32 bits in size. The instruction set contains instructions supporting the following data types held in registers: • 32-bit pointers • unsigned or signed 32-bit integers • unsigned 16-bit or 8-bit integers, held in zero-extended form • signed 16-bit or 8-bit integers, held in sign-extended form • two 16-bit integers packed into a register • four 8-bit integers packed into a register • unsigned or signed 64-bit integers held in two registers. Load and store operations can transfer bytes, halfwords, or words to and from memory. Loads of bytes or halfwords zero-extend or sign-extend the data as it is loaded, as specified in the appropriate load instruction. The instruction sets include load and store operations that transfer two or more words to and from memory. You can load and store doublewords using these instructions. The exclusive doubleword load/store instructions LDREXD and STREXD specify single-copy atomic doubleword accesses to memory. When any of the data types is described as unsigned, the N-bit data value represents a non-negative integer in the range 0 to 2N-1, using normal binary format. When any of these types is described as signed, the N-bit data value represents an integer in the range -2N-1 to +2N-1-1, using two's complement format. The instructions that operate on packed halfwords or bytes include some multiply instructions that use just one of two halfwords, and Single Instruction Multiple Data (SIMD) instructions that operate on all of the halfwords or bytes in parallel. Direct instruction support for 64-bit integers is limited, and most 64-bit operations require sequences of two or more instructions to synthesize them. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-3 Application Level Programmers’ Model A2.2.1 Integer arithmetic The instruction set provides a wide variety of operations on the values in registers, including bitwise logical operations, shifts, additions, subtractions, multiplications, and many others. These operations are defined using the pseudocode described in Appendix I Pseudocode Definition, usually in one of three ways: • By direct use of the pseudocode operators and built-in functions defined in Operators and built-in functions on page AppxI-11. • By use of pseudocode helper functions defined in the main text. These can be located using the table in Appendix J Pseudocode Index. • By a sequence of the form: 1. Use of the SInt(), UInt(), and Int() built-in functions defined in Converting bitstrings to integers on page AppxI-14 to convert the bitstring contents of the instruction operands to the unbounded integers that they represent as two's complement or unsigned integers. 2. Use of mathematical operators, built-in functions and helper functions on those unbounded integers to calculate other such integers. 3. Use of either the bitstring extraction operator defined in Bitstring extraction on page AppxI-12 or of the saturation helper functions described in Pseudocode details of saturation on page A2-9 to convert an unbounded integer result into a bitstring result that can be written to a register. A2-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model Shift and rotate operations The following types of shift and rotate operations are used in instructions: Logical Shift Left (LSL) moves each bit of a bitstring left by a specified number of bits. Zeros are shifted in at the right end of the bitstring. Bits that are shifted off the left end of the bitstring are discarded, except that the last such bit can be produced as a carry output. Logical Shift Right (LSR) moves each bit of a bitstring right by a specified number of bits. Zeros are shifted in at the left end of the bitstring. Bits that are shifted off the right end of the bitstring are discarded, except that the last such bit can be produced as a carry output. Arithmetic Shift Right (ASR) moves each bit of a bitstring right by a specified number of bits. Copies of the leftmost bit are shifted in at the left end of the bitstring. Bits that are shifted off the right end of the bitstring are discarded, except that the last such bit can be produced as a carry output. Rotate Right (ROR) moves each bit of a bitstring right by a specified number of bits. Each bit that is shifted off the right end of the bitstring is re-introduced at the left end. The last bit shifted off the right end of the bitstring can be produced as a carry output. Rotate Right with Extend (RRX) moves each bit of a bitstring right by one bit. The carry input is shifted in at the left end of the bitstring. The bit shifted off the right end of the bitstring can be produced as a carry output. Pseudocode details of shift and rotate operations These shift and rotate operations are supported in pseudocode by the following functions: // LSL_C() // ======= (bits(N), bit) LSL_C(bits(N) x, integer shift) assert shift > 0; extended_x = x : Zeros(shift); result = extended_x; carry_out = extended_x; return (result, carry_out); // LSL() // ===== bits(N) LSL(bits(N) x, integer shift) assert shift >= 0; if shift == 0 then result = x; else ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-5 Application Level Programmers’ Model (result, -) = LSL_C(x, shift); return result; // LSR_C() // ======= (bits(N), bit) LSR_C(bits(N) x, integer shift) assert shift > 0; extended_x = ZeroExtend(x, shift+N); result = extended_x; carry_out = extended_x; return (result, carry_out); // LSR() // ===== bits(N) LSR(bits(N) x, integer shift) assert shift >= 0; if shift == 0 then result = x; else (result, -) = LSR_C(x, shift); return result; // ASR_C() // ======= (bits(N), bit) ASR_C(bits(N) x, integer shift) assert shift > 0; extended_x = SignExtend(x, shift+N); result = extended_x; carry_out = extended_x; return (result, carry_out); // ASR() // ===== bits(N) ASR(bits(N) x, integer shift) assert shift >= 0; if shift == 0 then result = x; else (result, -) = ASR_C(x, shift); return result; // ROR_C() // ======= (bits(N), bit) ROR_C(bits(N) x, integer shift) assert shift != 0; m = shift MOD N; result = LSR(x,m) OR LSL(x,N-m); carry_out = result; return (result, carry_out); A2-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B // ROR() // ===== bits(N) ROR(bits(N) x, integer shift) if n == 0 then result = x; else (result, -) = ROR_C(x, shift); return result; // RRX_C() // ======= (bits(N), bit) RRX_C(bits(N) x, bit carry_in) result = carry_in : x; carry_out = x<0>; return (result, carry_out); // RRX() // ===== bits(N) RRX(bits(N) x, bit carry_in) (result, -) = RRX_C(x, shift); return result; Application Level Programmers’ Model ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-7 Application Level Programmers’ Model Pseudocode details of addition and subtraction In pseudocode, addition and subtraction can be performed on any combination of unbounded integers and bitstrings, provided that if they are performed on two bitstrings, the bitstrings must be identical in length. The result is another unbounded integer if both operands are unbounded integers, and a bitstring of the same length as the bitstring operand(s) otherwise. For the precise definition of these operations, see Addition and subtraction on page AppxI-15. The main addition and subtraction instructions can produce status information about both unsigned carry and signed overflow conditions. This status information can be used to synthesize multi-word additions and subtractions. In pseudocode the AddWithCarry() function provides an addition with a carry input and carry and overflow outputs: // AddWithCarry() // ============== (bits(N), bit, bit) AddWithCarry(bits(N) x, bits(N) y, bit carry_in) unsigned_sum = UInt(x) + UInt(y) + UInt(carry_in); signed_sum = SInt(x) + SInt(y) + UInt(carry_in); result = unsigned_sum; // == signed_sum carry_out = if UInt(result) == unsigned_sum then ‘0’ else ‘1’; overflow = if SInt(result) == signed_sum then ‘0’ else ‘1’; return (result, carry_out, overflow); An important property of the AddWithCarry() function is that if: (result, carry_out, overflow) = AddWithCarry(x, NOT(y), carry_in) then: • if carry_in == '1', then result == x-y with: — overflow == '1' if signed overflow occurred during the subtraction — carry_out == '1' if unsigned borrow did not occur during the subtraction, that is, if x >= y • if carry_in == '0', then result == x-y-1 with: — overflow == '1' if signed overflow occurred during the subtraction — carry_out == '1' if unsigned borrow did not occur during the subtraction, that is, if x > y. Together, these mean that the carry_in and carry_out bits in AddWithCarry() calls can act as NOT borrow flags for subtractions as well as carry flags for additions. A2-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model Pseudocode details of saturation Some instructions perform saturating arithmetic, that is, if the result of the arithmetic overflows the destination signed or unsigned N-bit integer range, the result produced is the largest or smallest value in that range, rather than wrapping around modulo 2N. This is supported in pseudocode by the SignedSatQ() and UnsignedSatQ() functions when a boolean result is wanted saying whether saturation occurred, and by the SignedSat() and UnsignedSat() functions when only the saturated result is wanted: // SignedSatQ() // ============ (bits(N), boolean) SignedSatQ(integer i, integer N) if i > 2^(N-1) - 1 then result = 2^(N-1) - 1; saturated = TRUE; elsif i < -(2^(N-1)) then result = -(2^(N-1)); saturated = TRUE; else result = i; saturated = FALSE; return (result, saturated); // UnsignedSatQ() // ============== (bits(N), boolean) UnsignedSatQ(integer i, integer N) if i > 2^N - 1 then result = 2^N - 1; saturated = TRUE; elsif i < 0 then result = 0; saturated = TRUE; else result = i; saturated = FALSE; return (result, saturated); // SignedSat() // =========== bits(N) SignedSat(integer i, integer N) (result, -) = SignedSatQ(i, N); return result; // UnsignedSat() // ============= bits(N) UnsignedSat(integer i, integer N) (result, -) = UnsignedSatQ(i, N); return result; SatQ(i, N, unsigned) returns either UnsignedSatQ(i,N) or SignedSatQ(i, N) depending on the value of its third argument, and Sat(i, N, unsigned) returns either UnsignedSat(i, N) or SignedSat(i, N) depending on the value of its third argument: ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-9 Application Level Programmers’ Model // SatQ() // ====== (bits(N), boolean) SatQ(integer i, integer N, boolean unsigned) (result, sat) = if unsigned then UnsignedSatQ(i, N) else SignedSatQ(i, N); return (result, sat); // Sat() // ===== bits(N) Sat(integer i, integer N, boolean unsigned) result = if unsigned then UnsignedSat(i, N) else SignedSat(i, N); return result; A2-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model A2.3 ARM core registers In the application level view, an ARM processor has: • thirteen general-purpose32-bit registers, R0 to R12 • three 32-bit registers, R13 to R15, that sometimes or always have a special use. Registers R13 to R15 are usually referred to by names that indicate their special uses: SP, the Stack Pointer Register R13 is used as a pointer to the active stack. In Thumb code, most instructions cannot access SP. The only instructions that can access SP are those designed to use SP as a stack pointer. The use of SP for any purpose other than as a stack pointer is deprecated. Note Using SP for any purpose other than as a stack pointer is likely to break the requirements of operating systems, debuggers, and other software systems, causing them to malfunction. LR, the Link Register Register R14 is used to store the return address from a subroutine. At other times, LR can be used for other purposes. When a BL or BLX instruction performs a subroutine call, LR is set to the subroutine return address. To perform a subroutine return, copy LR back to the program counter. This is typically done in one of two ways, after entering the subroutine with a BL or BLX instruction: • Return with a BX LR instruction. • On subroutine entry, store LR to the stack with an instruction of the form: PUSH {,LR} and use a matching instruction to return: POP {,PC} ThumbEE checks and handler calls use LR in a similar way. For details see Chapter A9 ThumbEE. PC, the Program Counter Register R15 is the program counter: • When executing an ARM instruction, PC reads as the address of the current instruction plus 8. • When executing a Thumb instruction, PC reads as the address of the current instruction plus 4. • Writing an address to PC causes a branch to that address. In Thumb code, most instructions cannot access PC. See ARM core registers on page B1-9 for the system level view of SP, LR, and PC. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-11 Application Level Programmers’ Model Note The names SP, LR and PC are preferred to R13, R14 and R15. However, sometimes it is simpler to use the R13-R15 names when referring to a group of registers. For example, it is simpler to refer to Registers R8 to R15, rather than to Registers R8 to R12, the SP, LR and PC. However these two descriptions of the group of registers have exactly the same meaning. A2.3.1 Pseudocode details of operations on ARM core registers In pseudocode, the R[] function is used to: • Read or write R0-R12, SP, and LR, using n == 0-12, 13, and 14 respectively. • Read the PC, using n == 15. This function has prototypes: bits(32) R[integer n] assert n >= 0 && n <= 15; R[integer n] = bits(32) value assert n >= 0 && n <= 14; The full operation of this function is explained in Pseudocode details of ARM core register operations on page B1-12. Descriptions of ARM store instructions that store the PC value use the PCStoreValue() pseudocode function to specify the PC value stored by the instruction: // PCStoreValue() // ============== bits(32) PCStoreValue() // This function returns the PC value. On architecture versions before ARMv7, it // is permitted to instead return PC+4, provided it does so consistently. It is // used only to describe ARM instructions, so it returns the address of the current // instruction plus 8 (normally) or 12 (when the alternative is permitted). return PC; Writing an address to the PC causes either a simple branch to that address or an interworking branch that also selects the instruction set to execute after the branch. A simple branch is performed by the BranchWritePC() function: // BranchWritePC() // =============== BranchWritePC(bits(32) address) if CurrentInstrSet() == InstrSet_ARM then if ArchVersion() < 6 && address<1:0> != ‘00’ then UNPREDICTABLE; BranchTo(address<31:2>:’00’); else BranchTo(address<31:1>:’0’); An interworking branch is performed by the BXWritePC() function: A2-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model // BXWritePC() // =========== BXWritePC(bits(32) address) if CurrentInstrSet() == InstrSet_ThumbEE then if address<0> == ‘1’ then BranchTo(address<31:1>:’0’); // Remaining in ThumbEE state else UNPREDICTABLE; else if address<0> == ‘1’ then SelectInstrSet(InstrSet_Thumb); BranchTo(address<31:1>:’0’); elsif address<1> == ‘0’ then SelectInstrSet(InstrSet_ARM); BranchTo(address); else // address<1:0> == ‘10’ UNPREDICTABLE; The LoadWritePC() and ALUWritePC() functions are used for two cases where the behavior was systematically modified between architecture versions: // LoadWritePC() // ============= LoadWritePC(bits(32) address) if ArchVersion() >= 5 then BXWritePC(address); else BranchWritePC(address); // ALUWritePC() // ============ ALUWritePC(bits(32) address) if ArchVersion() >= 7 && CurrentInstrSet() == InstrSet_ARM then BXWritePC(address); else BranchWritePC(address); Note The behavior of the PC writes performed by the ALUWritePC() function is different in Debug state, where there are more UNPREDICTABLE cases. The pseudocode in this section only handles the non-debug cases. For more information, see Data-processing instructions with the PC as the target in Debug state on page C5-12. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-13 Application Level Programmers’ Model A2.4 The Application Program Status Register (APSR) Program status is reported in the 32-bit Application Program Status Register (APSR). The format of the APSR is: 31 30 29 28 27 26 24 23 20 19 16 15 0 N Z CVQ RAZ/ SBZP Reserved GE[3:0] Reserved In the APSR, the bits are in the following categories: • Reserved bits are allocated to system features, or are available for future expansion. Unprivileged execution ignores writes to privileged fields. However, application level software that writes to the APSR must treat reserved bits as Do-Not-Modify (DNM) bits. For more information about the reserved bits, see Format of the CPSR and SPSRs on page B1-16. • Flags that can be set by many instructions: N, bit [31] Negative condition code flag. Set to bit [31] of the result of the instruction. If the result is regarded as a two's complement signed integer, then N == 1 if the result is negative and N == 0 if it is positive or zero. Z, bit [30] Zero condition code flag. Set to 1 if the result of the instruction is zero, and to 0 otherwise. A result of zero often indicates an equal result from a comparison. C, bit [29] Carry condition code flag. Set to 1 if the instruction results in a carry condition, for example an unsigned overflow on an addition. V, bit [28] Overflow condition code flag. Set to 1 if the instruction results in an overflow condition, for example a signed overflow on an addition. Q, bit [27] Set to 1 to indicate overflow or saturation occurred in some instructions, normally related to Digital Signal Processing (DSP). For more information, see Pseudocode details of saturation on page A2-9. GE[3:0], bits [19:16] Greater than or Equal flags. SIMD instructions update these flags to indicate the results from individual bytes or halfwords of the operation. These flags can control a later SEL instruction. For more information, see SEL on page A8-312. • Bits [26:24] are RAZ/SBZP. Therefore, software can use MSR instructions that write the top byte of the APSR without using a read, modify, write sequence. If it does this, it must write zeros to bits [26:24]. Instructions can test the N, Z, C, and V condition code flags to determine whether the instruction is to be executed. In this way, execution of the instruction can be made conditional on the result of a previous operation. For more information about conditional execution see Conditional execution on page A4-3 and Conditional execution on page A8-8. In ARMv7-A and ARMv7-R, the APSR is the same register as the CPSR, but the APSR must be used only to access the N, Z, C, V, Q, and GE[3:0] bits. For more information, see Program Status Registers (PSRs) on page B1-14. A2-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model A2.5 Execution state registers The execution state registers modify the execution of instructions. They control: • Whether instructions are interpreted as Thumb instructions, ARM instructions, ThumbEE instructions, or Java bytecodes. For more information, see ISETSTATE. • In Thumb state and ThumbEE state only, what conditions apply to the next four instructions. For more information, see ITSTATE on page A2-17. • Whether data is interpreted as big-endian or little-endian. For more information, see ENDIANSTATE on page A2-19. In ARMv7-A and ARMv7-R, the execution state registers are part of the Current Program Status Register. For more information, see Program Status Registers (PSRs) on page B1-14. There is no direct access to the execution state registers from application level instructions, but they can be changed by side effects of application level instructions. A2.5.1 ISETSTATE 10 JT The J bit and the T bit determine the instruction set used by the processor. Table A2-1 shows the encoding of these bits. Table A2-1 J and T bit encoding in ISETSTATE J T Instruction set state 0 0 ARM 0 1 Thumb 1 0 Jazelle 1 1 ThumbEE ARM state Thumb state Jazelle state The processor executes the ARM instruction set described in Chapter A5 ARM Instruction Set Encoding. The processor executes the Thumb instruction set as described in Chapter A6 Thumb Instruction Set Encoding. The processor executes Java bytecodes as part of a Java Virtual Machine (JVM). For more information, see Jazelle direct bytecode execution support on page A2-73. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-15 Application Level Programmers’ Model ThumbEE state The processor executes a variation of the Thumb instruction set specifically targeted for use with dynamic compilation techniques associated with an execution environment. This can be Java or other execution environments. This feature is required in ARMv7-A, and optional in ARMv7-R. For more information, see Thumb Execution Environment on page A2-69. Pseudocode details of ISETSTATE operations The following pseudocode functions return the current instruction set and select a new instruction set: enumeration InstrSet {InstrSet_ARM, InstrSet_Thumb, InstrSet_Jazelle, InstrSet_ThumbEE}; // CurrentInstrSet() // ================= InstrSet CurrentInstrSet() case ISETSTATE of when ‘00’ result = InstrSet_ARM; when ‘01’ result = InstrSet_Thumb; when ‘10’ result = InstrSet_Jazelle; when ‘11’ result = InstrSet_ThumbEE; return result; // SelectInstrSet() // ================ SelectInstrSet(InstrSet iset) case iset of when InstrSet_ARM if CurrentInstrSet() == InstrSet_ThumbEE then UNPREDICTABLE; else ISETSTATE = ‘00’; when InstrSet_Thumb ISETSTATE = ‘01’; when InstrSet_Jazelle ISETSTATE = ‘10’; when InstrSet_ThumbEE ISETSTATE = ‘11’; return; A2-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model A2.5.2 ITSTATE 76543210 IT[7:0] This field holds the If-Then execution state bits for the Thumb IT instruction. See IT on page A8-104 for a description of the IT instruction and the associated IT block. ITSTATE divides into two subfields: IT[7:5] Holds the base condition for the current IT block. The base condition is the top 3 bits of the condition specified by the IT instruction. This subfield is 0b000 when no IT block is active. IT[4:0] Encodes: • The size of the IT block. This is the number of instructions that are to be conditionally executed. The size of the block is implied by the position of the least significant 1 in this field, as shown in Table A2-2 on page A2-18. • The value of the least significant bit of the condition code for each instruction in the block. Note Changing the value of the least significant bit of a condition code from 0 to 1 has the effect of inverting the condition code. This subfield is 0b00000 when no IT block is active. When an IT instruction is executed, these bits are set according to the condition in the instruction, and the Then and Else (T and E) parameters in the instruction. For more information, see IT on page A8-104. An instruction in an IT block is conditional, see Conditional instructions on page A4-4 and Conditional execution on page A8-8. The condition used is the current value of IT[7:4]. When an instruction in an IT block completes its execution normally, ITSTATE is advanced to the next line of Table A2-2 on page A2-18. For details of what happens if such an instruction takes an exception see Exception entry on page B1-34. Note Instructions that can complete their normal execution by branching are only permitted in an IT block as its last instruction, and so always result in ITSTATE advancing to normal execution. Note ITSTATE affects instruction execution only in Thumb and ThumbEE states. In ARM and Jazelle states, ITSTATE must be '00000000', otherwise behavior is UNPREDICTABLE. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-17 Application Level Programmers’ Model Table A2-2 Effect of IT execution state bits [7:5] IT bits a Note [4] [3] [2] [1] [0] cond_base P1 P2 P3 P4 1 Entry point for 4-instruction IT block cond_base P1 P2 P3 1 0 Entry point for 3-instruction IT block cond_base P1 P2 1 0 0 Entry point for 2-instruction IT block cond_base P1 1 0 0 0 Entry point for 1-instruction IT block 000 0 0 0 0 0 Normal execution, not in an IT block a. Combinations of the IT bits not shown in this table are reserved. Pseudocode details of ITSTATE operations ITSTATE advances after normal execution of an IT block instruction. This is described by the ITAdvance() pseudocode function: // ITAdvance() // =========== ITAdvance() if ITSTATE<2:0> == ‘000’ then ITSTATE.IT = ‘00000000’; else ITSTATE.IT<4:0> = LSL(ITSTATE.IT<4:0>, 1); The following functions test whether the current instruction is in an IT block, and whether it is the last instruction of an IT block: // InITBlock() // =========== boolean InITBlock() return (ITSTATE.IT<3:0> != ‘0000’); // LastInITBlock() // =============== boolean LastInITBlock() return (ITSTATE.IT<3:0> == ‘1000’); A2-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model A2.5.3 ENDIANSTATE ARMv7-A and ARMv7-R support configuration between little-endian and big-endian interpretations of data memory, as shown in Table A2-3. The endianness is controlled by ENDIANSTATE. Table A2-3 APSR configuration of endianness ENDIANSTATE Endian mapping 0 Little-endian 1 Big-endian The ARM and Thumb instruction sets both include an instruction to manipulate ENDIANSTATE: SETEND BE Sets ENDIANSTATE to 1, for big-endian operation SETEND LE Sets ENDIANSTATE to 0, for little-endian operation. The SETEND instruction is unconditional. For more information, see SETEND on page A8-314. Pseudocode details of ENDIANSTATE operations The BigEndian() pseudocode function tests whether big-endian memory accesses are currently selected. // BigEndian() // =========== boolean BigEndian() return (ENDIANSTATE == ‘1’); ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-19 Application Level Programmers’ Model A2.6 Advanced SIMD and VFP extensions Advanced SIMD and VFP are two optional extensions to ARMv7. Advanced SIMD performs packed Single Instruction Multiple Data (SIMD) operations, either integer or single-precision floating-point. VFP performs single-precision or double-precision floating-point operations. Both extensions permit floating-point exceptions, such as overflow or division by zero, to be handled in an untrapped fashion. When handled in this way, a floating-point exception causes a cumulative status register bit to be set to 1 and a default result to be produced by the operation. The ARMv7 VFP implementation is VFPv3. ARMv7 also permits a variant of VFPv3, VFPv3U, that supports the trapping of floating-point exceptions, see VFPv3U on page A2-31. VFPv2 also supports the trapping of floating-point exceptions. For more information about floating-point exceptions see Floating-point exceptions on page A2-42. Each extension can be implemented at a number of levels. Table A2-4 shows the permitted combinations of implementations of the two extensions. Table A2-4 Permitted combinations of Advanced SIMD and VFP extensions Advanced SIMD VFP Not implemented Not implemented Integer only Not implemented Integer and single-precision floating-point Single-precision floating-point onlya Integer and single-precision floating-point Single-precision and double-precision floating-point Not implemented Single-precision floating-point onlya Not implemented Single-precision and double-precision floating-point a. Must be able to load and store double-precision data. The optional half-precision extensions provide conversion functions in both directions between half-precision floating-point and single-precision floating-point. These extensions can be implemented with any Advanced SIMD and VFP implementation that supports single-precision floating-point. The half-precision extensions apply to both VFP and Advanced SIMD if they are both implemented. For system-level information about the Advanced SIMD and VFP extensions see: • Advanced SIMD and VFP extension system registers on page B1-66 • Advanced SIMD and floating-point support on page B1-64. A2-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model Note Before ARMv7, the VFP extension was called the Vector Floating-point Architecture, and was used for vector operations. For details of these deprecated operations see Appendix F VFP Vector Operation Support. From ARMv7: • ARM recommends that the Advanced SIMD extension is used for single-precision vector floating-point operations • an implementation that requires support for vector operations must implement the Advanced SIMD extension. A2.6.1 Advanced SIMD and VFP extension registers Advanced SIMD and VFPv3 use the same register set. This is distinct from the ARM core register set. These registers are generally referred to as the extension registers. The extension register set consists of either thirty-two or sixteen doubleword registers, as follows: • If VFPv2 is implemented, it consists of sixteen doubleword registers. • If VFPv3 is implemented, it consists of either thirty-two or sixteen doubleword registers. Where necessary the terms VFPv3-D32 and VFPv3-D16 are used to distinguish between these two implementation options. • If Advanced SIMD is implemented, it consists of thirty-two doubleword registers. If both Advanced SIMD and VFPv3 are implemented, VFPv3 must be implemented in its VFPv3-D32 form. The Advanced SIMD and VFP views of the extension register set are not identical. They are described in the following sections. Figure A2-1 on page A2-22 shows the views of the extension register set, and the way the word, doubleword, and quadword registers overlap. Advanced SIMD views of the extension register set Advanced SIMD can view this register set as: • Sixteen 128-bit quadword registers, Q0-Q15. • Thirty-two 64-bit doubleword registers, D0-D31. This view is also available in VFPv3. These views can be used simultaneously. For example, a program might hold 64-bit vectors in D0 and D1 and a 128-bit vector in Q1. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-21 Application Level Programmers’ Model VFP views of the extension register set In VFPv3-D32, the extension register set consists of thirty-two doubleword registers, that VFP can view as: • Thirty-two 64-bit doubleword registers, D0-D31. This view is also available in Advanced SIMD. • Thirty-two 32-bit single word registers, S0-S31. Only half of the set is accessible in this view. In VFPv3-D16 and VFPv2, the extension register set consists of sixteen doubleword registers, that VFP can view as: • Sixteen 64-bit doubleword registers, D0-D15. • Thirty-two 32-bit single word registers, S0-S31. In each case, the two views can be used simultaneously. Advanced SIMD and VFP register mapping S0-S31 VFP only S0 S1 S2 S3 S4 S5 S6 S7 D0-D15 VFPv2 or VFPv3-D16 D0 D1 D2 D3 D0-D31 VFPv3-D32 or Advanced SIMD D0 D1 D2 D3 Q0-Q15 Advanced SIMD only Q0 Q1 ... ... ... ... S28 D14 D14 S29 Q7 S30 D15 D15 S31 D16 Q8 D17 ... ... A2-22 D30 Q15 D31 Figure A2-1 Advanced SIMD and VFP register set Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model The mapping between the registers is as follows: • S<2n> maps to the least significant half of D • S<2n+1> maps to the most significant half of D • D<2n> maps to the least significant half of Q • D<2n+1> maps to the most significant half of Q. For example, you can access the least significant half of the elements of a vector in Q6 by referring to D12, and the most significant half of the elements by referring to D13. Pseudocode details of Advanced SIMD and VFP extension registers The pseudocode function VFPSmallRegisterBank() returns FALSE if all of the 32 registers D0-D31 can be accessed, and TRUE if only the 16 registers D0-D15 can be accessed: boolean VFPSmallRegisterBank() In more detail, VFPSmallRegisterBank(): • returns TRUE for a VFPv2 or VFPv3-D16 implementation • for a VFPv3-D32 implementation: — returns FALSE if CPACR.D32DIS == 0 — returns TRUE if CPACR.D32DIS == 1 and CPACR.ASEDIS == 1 — results in UNPREDICTABLE behavior if CPACR.D32DIS == 1 and CPACR.ASEDIS == 0. For details of the CPACR register, see: • c1, Coprocessor Access Control Register (CPACR) on page B3-104 for a VMSA implementation • c1, Coprocessor Access Control Register (CPACR) on page B4-51 for a PMSA implementation. The S0-S31, D0-D31, and Q0-Q15 views of the registers are provided by the following functions: // The 64-bit extension register bank for Advanced SIMD and VFP. array bits(64) _D[0..31]; // S[] - non-assignment form // ========================= bits(32) S[integer n] assert n >= 0 && n <= 31; if (n MOD 2) == 0 then result = D[n DIV 2]<31:0>; else result = D[n DIV 2]<63:32>; return result; // S[] - assignment form // ===================== S[integer n] = bits(32) value assert n >= 0 && n <= 31; if (n MOD 2) == 0 then ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-23 Application Level Programmers’ Model D[n DIV 2]<31:0> = value; else D[n DIV 2]<63:32> = value; return; // D[] - non-assignment form // ========================= bits(64) D[integer n] assert n >= 0 && n <= 31; if n >= 16 && VFPSmallRegisterBank() then UNDEFINED; return _D[n]; // D[] - assignment form // ===================== D[integer n] = bits(64) value assert n >= 0 && n <= 31; if n >= 16 && VFPSmallRegisterBank() then UNDEFINED; _D[n] = value; return; // Q[] - non-assignment form // ========================= bits(128) Q[integer n] assert n >= 0 && n <= 15; return D[2*n+1]:D[2*n]; // Q[] - assignment form // ===================== Q[integer n] = bits(128) value assert n >= 0 && n <= 15; D[2*n] = value<63:0>; D[2*n+1] = value<127:64>; return; A2-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model A2.6.2 Data types supported by the Advanced SIMD extension When the Advanced SIMD extension is implemented, it can operate on integer and floating-point data. It defines a set of data types to represent the different data formats. Table A2-5 shows the available formats. Each instruction description specifies the data types that the instruction supports. Table A2-5 Advanced SIMD data types Data type specifier Meaning . .F .I .P .S .U Any element of bits Floating-point number of bits Signed or unsigned integer of bits Polynomial over {0,1} of degree less than Signed integer of bits Unsigned integer of bits The polynomial data type is described in Polynomial arithmetic over {0,1} on page A2-67. The .F16 data type is the half-precision data type currently selected by the FPSCR.AHP bit, see Advanced SIMD and VFP system registers on page A2-28. It is supported only when the half-precision extensions are implemented. The .F32 data type is the ARM standard single-precision floating-point data type, see Advanced SIMD and VFP single-precision format on page A2-34. The instruction definitions use a data type specifier to define the data types appropriate to the operation. Figure A2-2 on page A2-26 shows the hierarchy of Advanced SIMD data types. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-25 Application Level Programmers’ Model .I8 .S8 .8 .U8 .P8 - .16 .I16 .S16 .U16 .P16 .F16 ‡ .32 .I32 .S32 .U32 - .F32 .S64 .64 .I64 .U64 - - ‡ Supported only if the half-precision extensions are implemented Figure A2-2 Advanced SIMD data type hierarchy For example, a multiply instruction must distinguish between integer and floating-point data types. However, some multiply instructions use modulo arithmetic for integer instructions and therefore do not need to distinguish between signed and unsigned inputs. A multiply instruction that generates a double-width (long) result must specify the input data types as signed or unsigned, because for this operation it does make a difference. A2.6.3 Advanced SIMD vectors When the Advanced SIMD extension is implemented, a register can hold one or more packed elements, all of the same size and type. The combination of a register and a data type describes a vector of elements. The vector is considered to be an array of elements of the data type specified in the instruction. The number of elements in the vector is implied by the size of the data elements and the size of the register. Vector indices are in the range 0 to (number of elements – 1). An index of 0 refers to the least significant end of the vector. Figure A2-3 on page A2-27 shows examples of Advanced SIMD vectors: A2-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model 127 0 Qn .F32 [3] .F32 [2] .F32 [1] .F32 [0] 128-bit vector of single-precision (32-bit) floating-point numbers .S16 .S16 .S16 .S16 .S16 .S16 .S16 .S16 128-bit vector of 16-bit signed integers [7] [6] [5] [4] [3] [2] [1] [0] 63 0 Dn .S32 [1] .S32 [0] 64-bit vector of 32-bit signed integers .U16 .U16 .U16 .U16 64-bit vector of 16-bit unsigned integers [3] [2] [1] [0] Figure A2-3 Examples of Advanced SIMD vectors Pseudocode details of Advanced SIMD vectors The pseudocode function Elem[] is used to access the element of a specified index and size in a vector: // Elem[] - non-assignment form // ============================ bits(size) Elem[bits(N) vector, integer e, integer size] assert e >= 0 && (e+1)*size <= N; return vector<(e+1)*size-1:e*size>; // Elem[] - assignment form // ======================== Elem[bits(N) vector, integer e, integer size] = bits(size) value assert e >= 0 && (e+1)*size <= N; vector<(e+1)*size-1:e*size> = value; return; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-27 Application Level Programmers’ Model A2.6.4 Advanced SIMD and VFP system registers The Advanced SIMD and VFP extensions have a shared register space for system registers. Only one register in this space is accessible at the application level, see Floating-point Status and Control Register (FPSCR). See Advanced SIMD and VFP extension system registers on page B1-66 for the system level description of the registers. Floating-point Status and Control Register (FPSCR) The Floating-point Status and Control Register (FPSCR) is implemented in any system that implements one or both of: • the VFP extension • the Advanced SIMD extension. The FPSCR provides all necessary User level control of the floating-point system The FPSCR is a 32-bit read/write system register, accessible in unprivileged and privileged modes. The format of the FPSCR is: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 N ZCV Stride Len UNK/ SBZP UNK/ SBZP QC AHP DN FZ RMode UNK/SBZP IDE IXE UFE OFE DZE IOE IDC IXC UFC OFC DZC IOC Bits [31:28] Condition code bits. These are updated on floating-point comparison operations. They are not updated on SIMD operations, and do not affect SIMD instructions. N, bit [31] Negative condition code flag. Z, bit [30] Zero condition code flag. C, bit [29] Carry condition code flag. V, bit [28] Overflow condition code flag. QC, bit [27] Cumulative saturation flag, Advanced SIMD only. This bit is set to 1 to indicate that an Advanced SIMD integer operation has saturated since 0 was last written to this bit. For details of saturation, see Pseudocode details of saturation on page A2-9. The value of this bit is ignored by the VFP extension. If Advanced SIMD is not implemented this bit is UNK/SBZP. A2-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model AHP, bit[26] Alternative half-precision control bit: 0 IEEE half-precision format selected. 1 Alternative half-precision format selected. For more information see Advanced SIMD and VFP half-precision formats on page A2-38. If the half-precision extensions are not implemented this bit is UNK/SBZP. Bits [19,14:13,6:5] Reserved. UNK/SBZP. DN, bit [25] Default NaN mode control bit: 0 NaN operands propagate through to the output of a floating-point operation. 1 Any operation involving one or more NaNs returns the Default NaN. For more information, see NaN handling and the Default NaN on page A2-41. The value of this bit only controls VFP arithmetic. Advanced SIMD arithmetic always uses the Default NaN setting, regardless of the value of the DN bit. FZ, bit [24] Flush-to-zero mode control bit: 0 Flush-to-zero mode disabled. Behavior of the floating-point system is fully compliant with the IEEE 754 standard. 1 Flush-to-zero mode enabled. For more information, see Flush-to-zero on page A2-39. The value of this bit only controls VFP arithmetic. Advanced SIMD arithmetic always uses the Flush-to-zero setting, regardless of the value of the FZ bit. RMode, bits [23:22] Rounding Mode control field. The encoding of this field is: 0b00 Round to Nearest (RN) mode 0b01 Round towards Plus Infinity (RP) mode 0b10 Round towards Minus Infinity (RM) mode 0b11 Round towards Zero (RZ) mode. The specified rounding mode is used by almost all VFP floating-point instructions. Advanced SIMD arithmetic always uses the Round to Nearest setting, regardless of the value of the RMode bits. Stride, bits [21:20] and Len, bits [18:16] Use of nonzero values of these fields is deprecated in ARMv7. For details of their use in previous versions of the ARM architecture see Appendix F VFP Vector Operation Support. The values of these fields are ignored by the Advanced SIMD extension. Bits [15,12:8] Floating-point exception trap enable bits. These bits are supported only in VFPv2 and VFPv3U. They are reserved, RAZ/SBZP, on a system that implements VFPv3. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-29 Application Level Programmers’ Model The possible values of each bit are: 0 Untrapped exception handling selected 1 Trapped exception handling selected. The values of these bits control only VFP arithmetic. Advanced SIMD arithmetic always uses untrapped exception handling, regardless of the values of these bits. For more information, see Floating-point exceptions on page A2-42. IDE, bit [15] Input Denormal exception trap enable. IXE, bit [12] Inexact exception trap enable. UFE, bit [11] Underflow exception trap enable. OFE, bit [10] Overflow exception trap enable. DZE, bit [9] Division by Zero exception trap enable. IOE, bit [8] Invalid Operation exception trap enable. Bits [7,4:0] Cumulative exception flags for floating-point exceptions. Each of these bits is set to 1 to indicate that the corresponding exception has occurred since 0 was last written to it. How VFP instructions update these bits depends on the value of the corresponding exception trap enable bits: Trap enable bit = 0 If the floating-point exception occurs then the cumulative exception flag is set to 1. Trap enable bit = 1 If the floating-point exception occurs the trap handling software can decide whether to set the cumulative exception flag to 1. Advanced SIMD instructions set each cumulative exception flag if the corresponding exception occurs in one or more of the floating-point calculations performed by the instruction, regardless of the setting of the trap enable bits. For more information, see Floating-point exceptions on page A2-42. IDC, bit [7] Input Denormal cumulative exception flag. IXC, bit [4] Inexact cumulative exception flag. UFC, bit [3] Underflow cumulative exception flag. OFC, bit [2] Overflow cumulative exception flag. DZC, bit [1] Division by Zero cumulative exception flag. IOC, bit [0] Invalid Operation cumulative exception flag. If the processor implements the integer-only Advanced SIMD extension and does not implement the VFP extension, all of these bits except QC are UNK/SBZP. Writes to the FPSCR can have side-effects on various aspects of processor operation. All of these side-effects are synchronous to the FPSCR write. This means they are guaranteed not to be visible to earlier instructions in the execution stream, and they are guaranteed to be visible to later instructions in the execution stream. A2-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model Accessing the FPSCR You read or write the FPSCR using the VMRS and VMSR instructions. For more information, see VMRS on page A8-658 and VMSR on page A8-660. For example: VMRS , FPSCR VMSR FPSCR, ; Read Floating-point System Control Register ; Write Floating-point System Control Register A2.6.5 VFPv3U VFPv3 does not support the exception trap enable bits in the FPSCR, see Floating-point Status and Control Register (FPSCR) on page A2-28. All floating-point exceptions are untrapped. The VFPv3U variant of the VFPv3 architecture implements the exception trap enable bits in the FPSCR, and provides exception handling as described in VFP support code on page B1-70. There is a separate trap enable bit for each of the six floating-point exceptions described in Floating-point exceptions on page A2-42. The VFPv3U architecture is otherwise identical to VFPv3. Trapped exception handling never causes the corresponding cumulative exception bit of the FPSCR to be set to 1. If this behavior is desired, the trap handler routine must use a read, modify, write sequence on the FPSCR to set the cumulative exception bit. VFPv3U is backwards compatible with VFPv2. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-31 Application Level Programmers’ Model A2.7 Floating-point data types and arithmetic The VFP extension supports single-precision (32-bit) and double-precision (64-bit) floating-point data types and arithmetic as defined by the IEEE 754 floating-point standard. It also supports the ARM Standard modifications to that arithmetic described in Flush-to-zero on page A2-39 and NaN handling and the Default NaN on page A2-41. Trapped floating-point exception handling is supported in the VFPv3U variant only (see VFPv3U on page A2-31). ARM standard floating-point arithmetic means IEEE 754 floating-point arithmetic with the ARM standard modifications and: • the Round to Nearest rounding mode selected • untrapped exception handling selected for all floating-point exceptions. The Advanced SIMD extension only supports single-precision ARM standard floating-point arithmetic. Note Implementations of the VFP extension require support code to be installed in the system if trapped floating-point exception handling is required. See VFP support code on page B1-70. They might also require support code to be installed in the system to support other aspects of their floating-point arithmetic. It is IMPLEMENTATION DEFINED which aspects of VFP floating-point arithmetic are supported in a system without support code installed. Aspects of floating-point arithmetic that are implemented in support code are likely to run much more slowly than those that are executed in hardware. ARM recommends that: • To maximize the chance of getting high floating-point performance, software developers use ARM standard floating-point arithmetic. • Software developers check whether their systems have support code installed, and if not, observe the IMPLEMENTATION DEFINED restrictions on what operations their VFP implementation can handle without support code. • VFP implementation developers implement at least ARM standard floating-point arithmetic in hardware, so that it can be executed without any need for support code. A2-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model A2.7.1 ARM standard floating-point input and output values ARM standard floating-point arithmetic supports the following input formats defined by the IEEE 754 floating-point standard: • Zeros. • Normalized numbers. • Denormalized numbers are flushed to 0 before floating-point operations. For details, see Flush-to-zero on page A2-39. • NaNs. • Infinities. ARM standard floating-point arithmetic supports the Round to Nearest rounding mode defined by the IEEE 754 standard. ARM standard floating-point arithmetic supports the following output result formats defined by the IEEE 754 standard: • Zeros. • Normalized numbers. • Results that are less than the minimum normalized number are flushed to zero, see Flush-to-zero on page A2-39. • NaNs produced in floating-point operations are always the default NaN, see NaN handling and the Default NaN on page A2-41. • Infinities. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-33 Application Level Programmers’ Model A2.7.2 Advanced SIMD and VFP single-precision format The single-precision floating-point format used by the Advanced SIMD and VFP extensions is as defined by the IEEE 754 standard. This description includes ARM-specific details that are left open by the standard. It is only intended as an introduction to the formats and to the values they can contain. For full details, especially of the handling of infinities, NaNs and signed zeros, see the IEEE 754 standard. A single-precision value is a 32-bit word, and must be word-aligned when held in memory. It has the format: 31 30 23 22 0 S exponent fraction The interpretation of the format depends on the value of the exponent field, bits [30:23]: 0 < exponent < 0xFF The value is a normalized number and is equal to: –1S × 2(exponent – 127) × (1.fraction) The minimum positive normalized number is 2–126, or approximately 1.175 ×10–38. The maximum positive normalized number is (2 – 2–23) × 2127, or approximately 3.403 ×1038. exponent == 0 The value is either a zero or a denormalized number, depending on the fraction bits: fraction == 0 The value is a zero. There are two distinct zeros: +0 when S==0 –0 when S==1. These usually behave identically. In particular, the result is equal if +0 and –0 are compared as floating-point numbers. However, they yield different results in some circumstances. For example, the sign of the infinity produced as the result of dividing by zero depends on the sign of the zero. The two zeros can be distinguished from each other by performing an integer comparison of the two words. fraction != 0 The value is a denormalized number and is equal to: –1S × 2–126 × (0.fraction) The minimum positive denormalized number is 2–149, or approximately 1.401 × 10–45. Denormalized numbers are flushed to zero in the Advanced SIMD extension. They are optionally flushed to zero in the VFP extension. For details see Flush-to-zero on page A2-39. A2-34 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model exponent == 0xFF The value is either an infinity or a Not a Number (NaN), depending on the fraction bits: fraction == 0 The value is an infinity. There are two distinct infinities: +∞ When S==0. This represents all positive numbers that are too big to be represented accurately as a normalized number. -∞ When S==1. This represents all negative numbers with an absolute value that is too big to be represented accurately as a normalized number. fraction != 0 The value is a NaN, and is either a quiet NaN or a signaling NaN. In the VFP architecture, the two types of NaN are distinguished on the basis of their most significant fraction bit, bit [22]: bit [22] == 0 The NaN is a signaling NaN. The sign bit can take any value, and the remaining fraction bits can take any value except all zeros. bit [22] == 1 The NaN is a quiet NaN. The sign bit and remaining fraction bits can take any value. For details of the default NaN see NaN handling and the Default NaN on page A2-41. Note NaNs with different sign or fraction bits are distinct NaNs, but this does not mean you can use floating-point comparison instructions to distinguish them. This is because the IEEE 754 standard specifies that a NaN compares as unordered with everything, including itself. However, you can use integer comparisons to distinguish different NaNs. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-35 Application Level Programmers’ Model A2.7.3 VFP double-precision format The double-precision floating-point format used by the VFP extension is as defined by the IEEE 754 standard. This description includes VFP-specific details that are left open by the standard. It is only intended as an introduction to the formats and to the values they can contain. For full details, especially of the handling of infinities, NaNs and signed zeros, see the IEEE 754 standard. A double-precision value consists of two 32-bit words, with the formats: Most significant word: 31 30 20 19 0 S exponent fraction[51:32] Least significant word: 31 0 fraction[31:0] When held in memory, the two words must appear consecutively and must both be word-aligned. The order of the two words depends on the endianness of the memory system: • In a little-endian memory system, the least significant word appears at the lower memory address and the most significant word at the higher memory address. • In a big-endian memory system, the most significant word appears at the lower memory address and the least significant word at the higher memory address. Double-precision values represent numbers, infinities and NaNs in a similar way to single-precision values, with the interpretation of the format depending on the value of the exponent: 0 < exponent < 0x7FF The value is a normalized number and is equal to: –1S × 2exponent–1023 × (1.fraction) The minimum positive normalized number is 2–1022, or approximately 2.225 × 10–308. The maximum positive normalized number is (2 – 2–52) × 21023, or approximately 1.798 × 10308. exponent == 0 The value is either a zero or a denormalized number, depending on the fraction bits: fraction == 0 The value is a zero. There are two distinct zeros that behave analogously to the two single-precision zeros: +0 when S==0 –0 when S==1. A2-36 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model fraction != 0 The value is a denormalized number and is equal to: 1–S × 2–1022 × (0.fraction) The minimum positive denormalized number is 2–1074, or approximately 4.941 × 10–324. Optionally, denormalized numbers are flushed to zero in the VFP extension. For details see Flush-to-zero on page A2-39. exponent == 0x7FF The value is either an infinity or a NaN, depending on the fraction bits: fraction == 0 the value is an infinity. As for single-precision, there are two infinities: +∞ Plus infinity, when S==0 -∞ Minus infinity, when S==1. fraction != 0 The value is a NaN, and is either a quiet NaN or a signaling NaN. In the VFP architecture, the two types of NaN are distinguished on the basis of their most significant fraction bit, bit [19] of the most significant word: bit [19] == 0 The NaN is a signaling NaN. The sign bit can take any value, and the remaining fraction bits can take any value except all zeros. bit [19] == 1 The NaN is a quiet NaN. The sign bit and the remaining fraction bits can take any value. For details of the default NaN see NaN handling and the Default NaN on page A2-41. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-37 Application Level Programmers’ Model A2.7.4 Advanced SIMD and VFP half-precision formats Two half-precision floating-point formats are used by the half-precision extensions to Advanced SIMD and VFP: • IEEE half-precision, as described in the revised IEEE 754 standard • Alternative half-precision. The description of IEEE half-precision includes ARM-specific details that are left open by the standard, and is only an introduction to the formats and to the values they can contain. For more information, especially on the handling of infinities, NaNs and signed zeros, see the IEEE 754 standard. For both half-precision floating-point formats, the layout of the 16-bit number is the same. The format is: 15 14 10 9 0 S Exponent Fraction The interpretation of the format depends on the value of the exponent field, bits[14:10] and on which half-precision format is being used. 0 < exponent < 0x1F The value is a normalized number and is equal to: –1S × 2((exponent-15) × (1.fraction) The minimum positive normalized number is 2–14, or approximately 6.104 ×10–5. The maximum positive normalized number is (2 – 2–10) × 215, or 65504. Larger normalized numbers can be expressed using the alternative format when the exponent == 0x1F. exponent == 0 The value is either a zero or a denormalized number, depending on the fraction bits: fraction == 0 The value is a zero. There are two distinct zeros: +0 when S==0 –0 when S==1. fraction != 0 The value is a denormalized number and is equal to: –1S × 2–14 × (0.fraction) The minimum positive denormalized number is 2–25, or approximately 2.980 × 10–8. A2-38 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model exponent == 0x1F The value depends on which half-precision format is being used: IEEE Half-precision The value is either an infinity or a Not a Number (NaN), depending on the fraction bits: fraction == 0 The value is an infinity. There are two distinct infinities: +∞ When S==0. This represents all positive numbers that are too big to be represented accurately as a normalized number. -∞ When S==1. This represents all negative numbers with an absolute value that is too big to be represented accurately as a normalized number. fraction != 0 The value is a NaN, and is either a quiet NaN or a signaling NaN. The two types of NaN are distinguished by their most significant fraction bit, bit [9]: bit [9] == 0 The NaN is a signaling NaN. The sign bit can take any value, and the remaining fraction bits can take any value except all zeros. bit [9] == 1 The NaN is a quiet NaN. The sign bit and remaining fraction bits can take any value. Alternative Half-precision The value is a normalized number and is equal to: -1S x 216 x (1.fraction) The maximum positive normalized number is (2-2-10) x 216 or 131008. A2.7.5 Flush-to-zero The performance of floating-point implementations can be significantly reduced when performing calculations involving denormalized numbers and Underflow exceptions. In particular this occurs for implementations that only handle normalized numbers and zeros in hardware, and invoke support code to handle any other types of value. For an algorithm where a significant number of the operands and intermediate results are denormalized numbers, this can result in a considerable loss of performance. In many of these algorithms, this performance can be recovered, without significantly affecting the accuracy of the final result, by replacing the denormalized operands and intermediate results with zeros. To permit this optimization, VFP implementations have a special processing mode called Flush-to-zero mode. Advanced SIMD implementations always use Flush-to-zero mode. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-39 Application Level Programmers’ Model Behavior in Flush-to-zero mode differs from normal IEEE 754 arithmetic in the following ways: • All inputs to floating-point operations that are double-precision de-normalized numbers or single-precision de-normalized numbers are treated as though they were zero. This causes an Input Denormal exception, but does not cause an Inexact exception. The Input Denormal exception occurs only in Flush-to-zero mode. The FPSCR contains a cumulative exception bit FPSCR.IDC and trap enable bit FPSCR.IDE corresponding to the Input Denormal exception. For details of how these are used when processing the exception see Advanced SIMD and VFP system registers on page A2-28. The occurrence of all exceptions except Input Denormal is determined using the input values after flush-to-zero processing has occurred. • The result of a floating-point operation is flushed to zero if the result of the operation before rounding satisfies the condition: 0 < Abs(result) < MinNorm, where: — MinNorm == 2-126 for single-precision — MinNorm == 2-1022 for double-precision. This causes the FPSCR.UFC bit to be set to 1, and prevents any Inexact exception from occurring for the operation. Underflow exceptions occur only when a result is flushed to zero. In a VFPv2 or VFPv3U implementation Underflow exceptions that occur in Flush-to-zero mode are always treated as untrapped, even when the Underflow trap enable bit, FPSCR.UFE, is set to 1. • An Inexact exception does not occur if the result is flushed to zero, even though the final result of zero is not equivalent to the value that would be produced if the operation were performed with unbounded precision and exponent range. For information on the FPSCR bits see Floating-point Status and Control Register (FPSCR) on page A2-28. When an input or a result is flushed to zero the value of the sign bit of the zero is determined as follows: • In VFPv3 or VFPv3U, it is preserved. That is, the sign bit of the zero matches the sign bit of the input or result that is being flushed to zero. • In VFPv2, it is IMPLEMENTATION DEFINED whether it is preserved or always positive. The same choice must be made for all cases of flushing an input or result to zero. Flush-to-zero mode has no effect on half-precision numbers that are inputs to floating-point operations, or results from floating-point operations. A2-40 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model Note Flush-to-zero mode is incompatible with the IEEE 754 standard, and must not be used when IEEE 754 compatibility is a requirement. Flush-to-zero mode must be treated with care. Although it can lead to a major performance increase on many algorithms, there are significant limitations on its use. These are application dependent: • On many algorithms, it has no noticeable effect, because the algorithm does not normally use denormalized numbers. • On other algorithms, it can cause exceptions to occur or seriously reduce the accuracy of the results of the algorithm. A2.7.6 NaN handling and the Default NaN The IEEE 754 standard specifies that: • an operation that produces an Invalid Operation floating-point exception generates a quiet NaN as its result if that exception is untrapped • an operation involving a quiet NaN operand, but not a signaling NaN operand, returns an input NaN as its result. The VFP behavior when Default NaN mode is disabled adheres to this with the following extra details, where the first operand means the first argument to the pseudocode function call that describes the operation: • If an untrapped Invalid Operation floating-point exception is produced because one of the operands is a signaling NaN, the quiet NaN result is equal to the signaling NaN with its most significant fraction bit changed to 1. If both operands are signaling NaNs, the result is produced in this way from the first operand. • If an untrapped Invalid Operation floating-point exception is produced for other reasons, the quiet NaN result is the Default NaN. • If both operands are quiet NaNs, the result is the first operand. The VFP behavior when Default NaN mode is enabled, and the Advanced SIMD behavior in all circumstances, is that the Default NaN is the result of all floating-point operations that: • generate untrapped Invalid Operation floating-point exceptions • have one or more quiet NaN inputs. Table A2-6 on page A2-42 shows the format of the default NaN for ARM floating-point processors. Default NaN mode is selected for VFP by setting the FPSCR.DN bit to 1, see Floating-point Status and Control Register (FPSCR) on page A2-28. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-41 Application Level Programmers’ Model Other aspects of the functionality of the Invalid Operation exception are not affected by Default NaN mode. These are that: • If untrapped, it causes the FPSCR.IOC bit be set to 1. • If trapped, it causes a user trap handler to be invoked. This is only possible in VFPv2 and VFPv3U. Table A2-6 Default NaN encoding Half-precision, IEEE Format Single-precision Double-precision Sign bit 0 0a 0a Exponent 0x1F 0xFF 0x7FF Fraction Bit[9] == 1, bits[8:0] == 0 bit [22] == 1, bits [21:0] == 0 bit [51] == 1, bits [50:0] == 0 a. In VFPv2, the sign bit of the Default NaN is UNKNOWN. A2.7.7 Floating-point exceptions The Advanced SIMD and VFP extensions record the following floating-point exceptions in the FPSCR cumulative flags, see Floating-point Status and Control Register (FPSCR) on page A2-28: IOC Invalid Operation. The flag is set to 1 if the result of an operation has no mathematical value or cannot be represented. Cases include infinity * 0, +infinity + (–infinity), for example. These tests are made after flush-to-zero processing. For example, if flush-to-zero mode is selected, multiplying a denormalized number and an infinity is treated as 0 * infinity and causes an Invalid Operation floating-point exception. IOC is also set on any floating-point operation with one or more signaling NaNs as operands, except for negation and absolute value, as described in Negation and absolute value on page A2-47. DZC Division by Zero. The flag is set to 1 if a divide operation has a zero divisor and a dividend that is not zero, an infinity or a NaN. These tests are made after flush-to-zero processing, so if flush-to-zero processing is selected, a denormalized dividend is treated as zero and prevents Division by Zero from occurring, and a denormalized divisor is treated as zero and causes Division by Zero to occur if the dividend is a normalized number. For the reciprocal and reciprocal square root estimate functions the dividend is assumed to be +1.0. This means that a zero or denormalized operand to these functions sets the DZC flag. OFC Overflow. The flag is set to 1 if the absolute value of the result of an operation, produced after rounding, is greater than the maximum positive normalized number for the destination precision. UFC Underflow. The flag is set to 1 if the absolute value of the result of an operation, produced before rounding, is less than the minimum positive normalized number for the destination precision, and the rounded result is inexact. A2-42 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model The criteria for the Underflow exception to occur are different in Flush-to-zero mode. For details, see Flush-to-zero on page A2-39. IXC Inexact. The flag is set to 1 if the result of an operation is not equivalent to the value that would be produced if the operation were performed with unbounded precision and exponent range. The criteria for the Inexact exception to occur are different in Flush-to-zero mode. For details, see Flush-to-zero on page A2-39. IDC Input Denormal. The flag is set to 1 if a denormalized input operand is replaced in the computation by a zero, as described in Flush-to-zero on page A2-39. With the Advanced SIMD extension and the VFPv3 extension these are non-trapping exceptions and the data-processing instructions do not generate any trapped exceptions. With the VFPv2 and VFPv3U extensions: • These exceptions can be trapped, by setting trap enable flags in the FPSCR, see VFPv3U on page A2-31. Trapped floating-point exceptions are delivered to user code in an IMPLEMENTATION DEFINED fashion. • The definitions of the floating-point exceptions change as follows: — if the Underflow exception is trapped, it occurs if the absolute value of the result of an operation, produced before rounding, is less than the minimum positive normalized number for the destination precision, regardless of whether the rounded result is inexact — higher priority trapped exceptions can prevent lower priority exceptions from occurring, as described in Combinations of exceptions on page A2-44. Table A2-7 shows the default results of the floating-point exceptions: Table A2-7 Floating-point exception default results Exception type Default result for positive sign Default result for negative sign IOC, Invalid Operation Quiet NaN DZC, Division by Zero +∞ (plus infinity) OFC, Overflow RN, RP: RM, RZ: +∞ (plus infinity) +MaxNorm UFC, Underflow Normal rounded result IXC, Inexact Normal rounded result IDC, Input Denormal Normal rounded result Quiet NaN –∞ (minus infinity) RN, RM: RP, RZ: –∞ (minus infinity) –MaxNorm Normal rounded result Normal rounded result Normal rounded result ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-43 Application Level Programmers’ Model In Table A2-7 on page A2-43: MaxNorm The maximum normalized number of the destination precision RM Round towards Minus Infinity mode, as defined in the IEEE 754 standard RN Round to Nearest mode, as defined in the IEEE 754 standard RP Round towards Plus Infinity mode, as defined in the IEEE 754 standard RZ Round towards Zero mode, as defined in the IEEE 754 standard • For Invalid Operation exceptions, for details of which quiet NaN is produced as the default result see NaN handling and the Default NaN on page A2-41. • For Division by Zero exceptions, the sign bit of the default result is determined normally for a division. This means it is the exclusive OR of the sign bits of the two operands. • For Overflow exceptions, the sign bit of the default result is determined normally for the overflowing operation. Combinations of exceptions The following pseudocode functions perform floating-point operations: FixedToFP() FPAbs() FPAdd() FPCompare() FPCompareGE() FPCompareGT() FPDiv() FPDoubleToSingle() FPMax() FPMin() FPMul() FPNeg() FPRecipEstimate() FPRecipStep() FPRSqrtEstimate() FPRSqrtStep() FPSingleToDouble() FPSqrt() FPSub() FPToFixed() All of these operations except FPAbs() and FPNeg() can generate floating-point exceptions. More than one exception can occur on the same operation. The only combinations of exceptions that can occur are: • Overflow with Inexact • Underflow with Inexact • Input Denormal with other exceptions. A2-44 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model When none of the exceptions caused by an operation are trapped, any exception that occurs causes the associated cumulative flag in the FPSCR to be set. When one or more exceptions caused by an operation are trapped, the behavior of the instruction depends on the priority of the exceptions. The Inexact exception is treated as lowest priority, and Input Denormal as highest priority: • If the higher priority exception is trapped, its trap handler is called. It is IMPLEMENTATION DEFINED whether the parameters to the trap handler include information about the lower priority exception. Apart from this, the lower priority exception is ignored in this case. • If the higher priority exception is untrapped, its cumulative bit is set to 1 and its default result is evaluated. Then the lower priority exception is handled normally, using this default result. Some floating-point instructions specify more than one floating-point operation, as indicated by the pseudocode descriptions of the instruction. In such cases, an exception on one operation is treated as higher priority than an exception on another operation if the occurrence of the second exception depends on the result of the first operation. Otherwise, it is UNPREDICTABLE which exception is treated as higher priority. For example, a VMLA.F32 instruction specifies a floating-point multiplication followed by a floating-point addition. The addition can generate Overflow, Underflow and Inexact exceptions, all of which depend on both operands to the addition and so are treated as lower priority than any exception on the multiplication. The same applies to Invalid Operation exceptions on the addition caused by adding opposite-signed infinities. The addition can also generate an Input Denormal exception, caused by the addend being a denormalized number while in Flush-to-zero mode. It is UNPREDICTABLE which of an Input Denormal exception on the addition and an exception on the multiplication is treated as higher priority, because the occurrence of the Input Denormal exception does not depend on the result of the multiplication. The same applies to an Invalid Operation exception on the addition caused by the addend being a signaling NaN. Note Like other details of VFP instruction execution, these rules about exception handling apply to the overall results produced by an instruction when the system uses a combination of hardware and support code to implement it. See VFP support code on page B1-70 for more information. These principles also apply to the multiple floating-point operations generated by VFP instructions in the deprecated VFP vector mode of operation. For details of this mode of operation see Appendix F VFP Vector Operation Support. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-45 Application Level Programmers’ Model A2.7.8 Pseudocode details of floating-point operations This section contains pseudocode definitions of the floating-point operations used by the architecture. Generation of specific floating-point values The following pseudocode functions generate specific floating-point values. The sign argument of FPInfinity(), FPMaxNormal(), and FPZero() is '0' for the positive version and '1' for the negative version. // FPZero() // ======== bits(N) FPZero(bit sign, integer N) assert N == 16 || N == 32 || N == 64; if N == 16 then return sign : ‘00000 0000000000’; elsif N == 32 then return sign : ‘00000000 00000000000000000000000’; else return sign : ‘00000000000 0000000000000000000000000000000000000000000000000000’; // FPTwo() // ======= bits(N) FPTwo(integer N) assert N == 32 || N == 64; if N == 32 then return ‘0 10000000 00000000000000000000000’; else return ‘0 10000000000 0000000000000000000000000000000000000000000000000000’; // FPThree() // ========= bits(N) FPThree(integer N) assert N == 32 || N == 64; if N == 32 then return ‘0 10000000 10000000000000000000000’; else return ‘0 10000000000 1000000000000000000000000000000000000000000000000000’; // FPMaxNormal() // ============= bits(N) FPMaxNormal(bit sign, integer N) assert N == 16 || N == 32 || N == 64; if N == 16 then return sign : ‘11110 1111111111’; elsif N == 32 then return sign : ‘11111110 11111111111111111111111’; else return sign : ‘11111111110 1111111111111111111111111111111111111111111111111111’; A2-46 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model // FPInfinity() // ============ bits(N) FPInfinity(bit sign, integer N) assert N == 16 || N == 32 || N == 64; if N == 16 then return sign : ‘11111 0000000000’; elsif N == 32 then return sign : ‘11111111 00000000000000000000000’; else return sign : ‘11111111111 0000000000000000000000000000000000000000000000000000’; // FPDefaultNaN() // ============== bits(N) FPDefaultNaN(integer N) assert N == 16 || N == 32 || N == 64; if N == 16 then return ‘0 11111 1000000000’; elsif N == 32 then return ‘0 11111111 10000000000000000000000’; else return ‘0 11111111111 1000000000000000000000000000000000000000000000000000’; Note This definition of FPDefaultNaN() applies to VFPv3 and VFPv3U. For VFPv2, the sign bit of the result is a single-bit UNKNOWN value, instead of 0. Negation and absolute value The floating-point negation and absolute value operations only affect the sign bit. They do not treat NaN operands specially, nor denormalized number operands when flush-to-zero is selected. // FPNeg() // ======= bits(N) FPNeg(bits(N) operand) assert N == 32 || N == 64; return NOT(operand) : operand; // FPAbs() // ======= bits(N) FPAbs(bits(N) operand) assert N == 32 || N == 64; return ‘0’ : operand; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-47 Application Level Programmers’ Model Floating-point value unpacking The FPUnpack() function determines the type and numerical value of a floating-point number. It also does flush-to-zero processing on input operands. enumeration FPType {FPType_Nonzero, FPType_Zero, FPType_Infinity, FPType_QNaN, FPType_SNaN}; // FPUnpack() // ========== // // Unpack a floating-point number into its type, sign bit and the real number // that it represents. The real number result has the correct sign for numbers // and infinities, is very large in magnitude for infinities, and is 0.0 for // NaNs. (These values are chosen to simplify the description of comparisons // and conversions.) // // The ‘fpscr_val’ argument supplies FPSCR control bits. Status information is // updated directly in the FPSCR where appropriate. (FPType, bit, real) FPUnpack(bits(N) fpval, bits(32) fpscr_val) assert N == 16 || N == 32 || N == 64; if N == 16 then sign = fpval<15>; exp = fpval<14:10>; frac = fpval<9:0>; if IsZero(exp) then // Produce zero if value is zero if IsZero(frac) then type = FPType_Zero; value = 0.0; else type = FPType_Nonzero; value = 2^-14 * (UInt(frac) * 2^-10); elsif IsOnes(exp) && fpscr_val<26> == ‘0’ then // Infinity or NaN in IEEE format if IsZero(frac) then type = FPType_Infinity; value = 2^1000000; else type = if frac<9> == ‘1’ then FPType_QNaN else FPType_SNaN; value = 0.0; else type = FPType_Nonzero; value = 2^(UInt(exp)-15) * (1.0 + UInt(frac) * 2^-10)); elsif N == 32 then sign = fpval<31>; exp = fpval<30:23>; frac = fpval<22:0>; if IsZero(exp) then // Produce zero if value is zero or flush-to-zero is selected. if IsZero(frac) || fpscr_val<24> == ‘1’ then type = FPType_Zero; value = 0.0; if !IsZero(frac) then // Denormalized input flushed to zero FPProcessException(FPExc_InputDenorm, fpscr_val); else type = FPType_Nonzero; value = 2^-126 * (UInt(frac) * 2^-23); A2-48 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model elsif IsOnes(exp) then if IsZero(frac) then type = FPType_Infinity; value = 2^1000000; else type = if frac<22> == ‘1’ then FPType_QNaN else FPType_SNaN; value = 0.0; else type = FPType_Nonzero; value = 2^(UInt(exp)-127) * (1.0 + UInt(frac) * 2^-23)); else // N == 64 sign = fpval<63>; exp = fpval<62:52>; frac = fpval<51:0>; if IsZero(exp) then // Produce zero if value is zero or flush-to-zero is selected. if IsZero(frac) || fpscr_val<24> == ‘1’ then type = FPType_Zero; value = 0.0; if !IsZero(frac) then // Denormalized input flushed to zero FPProcessException(FPExc_InputDenorm, fpscr_val); else type = FPType_Nonzero; value = 2^-1022 * (UInt(frac) * 2^-52); elsif IsOnes(exp) then if IsZero(frac) then type = FPType_Infinity; value = 2^1000000; else type = if frac<51> == ‘1’ then FPType_QNaN else FPType_SNaN; value = 0.0; else type = FPType_Nonzero; value = 2^(UInt(exp)-1023) * (1.0 + UInt(frac) * 2^-52)); if sign == ‘1’ then value = -value; return (type, sign, value); Floating-point exception and NaN handling The FPProcessException() procedure checks whether a floating-point exception is trapped, and handles it accordingly: enumeration FPExc (FPExc_InvalidOp, FPExc_DivideByZero, FPExc_Overflow, FPExc_Underflow, FPExc_Inexact, FPExc_InputDenorm}; // FPProcessException() // ==================== // // The ‘fpscr_val’ argument supplies FPSCR control bits. Status information is // updated directly in the FPSCR where appropriate. FPProcessException(FPExc exception, bits(32) fpscr_val) // Get appropriate FPSCR bit numbers case exception of when FPExc_InvalidOp enable = 8; cumul = 0; when FPExc_DivideByZero enable = 9; cumul = 1; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-49 Application Level Programmers’ Model when FPExc_Overflow enable = 10; cumul = 2; when FPExc_Underflow enable = 11; cumul = 3; when FPExc_Inexact enable = 12; cumul = 4; when FPExc_InputDenorm enable = 15; cumul = 7; if fpscr_val then IMPLEMENTATION_DEFINED floating-point trap handling; else FPSCR = ‘1’; return; The FPProcessNaN() function processes a NaN operand, producing the correct result value and generating an Invalid Operation exception if necessary: // FPProcessNaN() // ============== // // The ‘fpscr_val’ argument supplies FPSCR control bits. Status information is // updated directly in the FPSCR where appropriate. bits(N) FPProcessNaN(FPType type, bits(N) operand, bits(32) fpscr_val) assert N == 32 || N == 64; topfrac = if N == 32 then 22 else 51; result = operand; if type = FPType_SNaN then result = ‘1’; FPProcessException(FPExc_InvalidOp, fpscr_val); if fpscr_val<25> == ‘1’ then // DefaultNaN requested result = FPDefaultNaN(N); return result; The FPProcessNaNs() function performs the standard NaN processing for a two-operand operation: // FPProcessNaNs() // =============== // // The boolean part of the return value says whether a NaN has been found and // processed. The bits(N) part is only relevant if it has and supplies the // result of the operation. // // The ‘fpscr_val’ argument supplies FPSCR control bits. Status information is // updated directly in the FPSCR where appropriate. (boolean, bits(N)) FPProcessNaNs(FPType type1, FPType type2, bits(N) op1, bits(N) op2, bits(32) fpscr_val) assert N == 32 || N == 64; if type1 == FPType_SNaN then done = TRUE; result = FPProcessNaN(type1, op1, fpscr_val); elsif type2 == FPType_SNaN then done = TRUE; result = FPProcessNaN(type2, op2, fpscr_val); elsif type1 == FPType_QNaN then done = TRUE; result = FPProcessNaN(type1, op1, fpscr_val); elsif type2 == FPType_QNaN then done = TRUE; result = FPProcessNaN(type2, op2, fpscr_val); A2-50 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model else done = FALSE; result = Zeros(N); // ‘Don’t care’ result return (done, result); Floating-point rounding The FPRound() function rounds and encodes a floating-point result value to a specified destination format. This includes processing Overflow, Underflow and Inexact floating-point exceptions and performing flush-to-zero processing on result values. // FPRound() // ========= // // The ‘fpscr_val’ argument supplies FPSCR control bits. Status information is // updated directly in the FPSCR where appropriate. bits(N) FPRound(real result, integer N, bits(32) fpscr_val) assert N == 16 || N == 32 || N == 64; assert result != 0.0; // Obtain format parameters - minimum exponent, numbers of exponent and fraction bits. if N == 16 then minimum_exp = -14; E = 5; F = 10; elsif N == 32 then minimum_exp = -126; E = 8; F = 23; else // N == 64 minimum_exp = -1022; E = 11; F = 52; // Split value into sign, unrounded mantissa and exponent. if result < 0.0 then sign = ‘1’; mantissa = -result; else sign = ‘0’; mantissa = result; exponent = 0; while mantissa < 1.0 do mantissa = mantissa * 2.0; exponent = exponent - 1; while mantissa >= 2.0 do mantissa = mantissa / 2.0; exponent = exponent + 1; // Deal with flush-to-zero. if fpscr_val<24> == ‘1’ && N != 16 && exponent < minimum_exp then result = FPZero(sign, N); FPSCR.UFC = ‘1’; // Flush-to-zero never generates a trapped exception else // Start creating the exponent value for the result. Start by biasing the actual exponent // so that the minimum exponent becomes 1, lower values 0 (indicating possible underflow). biased_exp = Max(exponent - minimum_exp + 1, 0); if biased_exp == 0 then mantissa = mantissa / 2^(minimum_exp - exponent); // Get the unrounded mantissa as an integer, and the “units in last place” rounding error. int_mant = RoundDown(mantissa * 2^F); // < 2^F if biased_exp == 0, >= 2^F if not error = mantissa * 2^F - int_mant; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-51 Application Level Programmers’ Model // Underflow occurs if exponent is too small before rounding, and result is inexact or // the Underflow exception is trapped. if biased_exp == 0 && (error != 0.0 || fpscr_val<11> == ‘1’) then FPProcessException(FPExc_Underflow, fpscr_val); // Round result according to rounding mode. case fpscr_val<23:22> of when ‘00’ // Round to Nearest (rounding to even if exactly halfway) round_up = (error > 0.5 || (error == 0.5 && int_mant<0> == ‘1’)); overflow_to_inf = TRUE; when ‘01’ // Round towards Plus Infinity round_up = (error != 0.0 && sign == ‘0’); overflow_to_inf = (sign == ‘0’); when ‘10’ // Round towards Minus Infinity round_up = (error != 0.0 && sign == ‘1’); overflow_to_inf = (sign == ‘1’); when ‘11’ // Round towards Zero round_up = FALSE; overflow_to_inf = FALSE; if round_up then int_mant = int_mant + 1; if int_mant == 2^F then // Rounded up from denormalized to normalized biased_exp = 1; if int_mant == 2^(F+1) then // Rounded up to next exponent biased_exp = biased_exp + 1; int_mant = int_mant DIV 2; // Deal with overflow and generate result. if N != 16 || fpscr_val<26> == ‘0’ then // Single, double or IEEE half precision if biased_exp >= 2^E - 1 then result = if overflow_to_inf then FPInfinity(sign, N) else FPMaxNormal(sign, N); FPProcessException(FPExc_Overflow, fpscr_val); else result = sign : biased_exp : int_mant; else // Alternative half precision if biased_exp >= 2^E then result = sign : Ones(15); FPProcessException(FPExc_InvalidOp, fpscr_val); error = 0.0; // avoid an Inexact exception else result = sign : biased_exp : int_mant; // Deal with Inexact exception. if error != 0 then FPProcessException(FPExc_Inexact, fpscr_val); return result; Selection of ARM standard floating-point arithmetic StandardFPSCRValue is an FPSCR value that selects ARM standard floating-point arithmetic. Most of the arithmetic functions have a boolean fpscr_controlled argument that is TRUE for VFP operations and FALSE for Advanced SIMD operations, and that selects between using the real FPSCR value and this value. A2-52 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model // StandardFPSCRValue() // ==================== bits(32) StandardFPSCRValue() return ‘00000’ : FPSCR<26> : ‘11000000000000000000000000’; Comparisons The FPCompare() function compares two floating-point numbers, producing an (N,Z,C,V) flags result as shown in Table A2-8: Table A2-8 VFP comparison flag values Comparison result N Z C V Equal Less than Greater than Unordered 0 11 0 1 00 0 0 01 0 0 01 1 This result is used to define the VCMP instruction in the VFP extension. The VCMP instruction writes these flag values in the FPSCR. After using a VMRS instruction to transfer them to the APSR, they can be used to control conditional execution as shown in Table A8-1 on page A8-8. // FPCompare() // =========== (bit, bit, bit, bit) FPCompare(bits(N) op1, bits(N) op2, boolean quiet_nan_exc, boolean fpscr_controlled) assert N == 32 || N == 64; fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type1,sign1,value1) = FPUnpack(op1, fpscr_val); (type2,sign2,value2) = FPUnpack(op2, fpscr_val); if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then result = (‘0’,’0’,’1’,’1’); if type1==FPType_SNaN || type2==FPType_SNaN || quiet_nan_exc then FPProcessException(FPExc_InvalidOp, fpscr_val); else // All non-NaN cases can be evaluated on the values produced by FPUnpack() if value1 == value2 then result = (‘0’,’1’,’1’,’0’); elsif value1 < value2 then result = (‘1’,’0’,’0’,’0’); else // value1 > value2 result = (‘0’,’0’,’1’,’0’); return result; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-53 Application Level Programmers’ Model The FPCompareEQ(), FPCompareGE() and FPCompareGT() functions are used to describe Advanced SIMD instructions that perform floating-point comparisons. // FPCompareEQ() // ============= boolean FPCompareEQ(bits(32) op1, bits(32) op2, boolean fpscr_controlled) fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type1,sign1,value1) = FPUnpack(op1, fpscr_val); (type2,sign2,value2) = FPUnpack(op2, fpscr_val); if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then result = FALSE; if type1==FPType_SNaN || type2==FPType_SNaN then FPProcessException(FPExc_InvalidOp, fpscr_val); else // All non-NaN cases can be evaluated on the values produced by FPUnpack() result = (value1 == value2); return result; // FPCompareGE() // ============= boolean FPCompareGE(bits(32) op1, bits(32) op2, boolean fpscr_controlled) fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type1,sign1,value1) = FPUnpack(op1, fpscr_val); (type2,sign2,value2) = FPUnpack(op2, fpscr_val); if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then result = FALSE; FPProcessException(FPExc_InvalidOp, fpscr_val); else // All non-NaN cases can be evaluated on the values produced by FPUnpack() result = (value1 >= value2); return result; // FPCompareGT() // ============= boolean FPCompareGT(bits(32) op1, bits(32) op2, boolean fpscr_controlled) fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type1,sign1,value1) = FPUnpack(op1, fpscr_val); (type2,sign2,value2) = FPUnpack(op2, fpscr_val); if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then result = FALSE; FPProcessException(FPExc_InvalidOp, fpscr_val); else // All non-NaN cases can be evaluated on the values produced by FPUnpack() result = (value1 > value2); return result; A2-54 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model Maximum and minimum // FPMax() // ======= bits(N) FPMax(bits(N) op1, bits(N) op2, boolean fpscr_controlled) assert N == 32 || N == 64; fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type1,sign1,value1) = FPUnpack(op1, fpscr_val); (type2,sign2,value2) = FPUnpack(op2, fpscr_val); (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val); if !done then if type1 == FPType_Zero && type2 == FPType_Zero && sign1 == NOT(sign2) then // Opposite-signed zeros produce +0.0 result = FPZero(‘0’, N); else // All other cases can be evaluated on the values produced by FPUnpack() result = if value1 > value2 then op1 else op2; return result; // FPMin() // ======= bits(N) FPMin(bits(N) op1, bits(N) op2, boolean fpscr_controlled) assert N == 32 || N == 64; fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type1,sign1,value1) = FPUnpack(op1, fpscr_val); (type2,sign2,value2) = FPUnpack(op2, fpscr_val); (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val); if !done then if type1 == FPType_Zero && type2 == FPType_Zero && sign1 == NOT(sign2) then // Opposite-signed zeros produce -0.0 result = FPZero(‘1’, N); else // All other cases can be evaluated on the values produced by FPUnpack() result = if value1 < value2 then op1 else op2; return result; Addition and subtraction // FPAdd() // ======= bits(N) FPAdd(bits(N) op1, bits(N) op2, boolean fpscr_controlled) assert N == 32 || N == 64; fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type1,sign1,value1) = FPUnpack(op1, fpscr_val); (type2,sign2,value2) = FPUnpack(op2, fpscr_val); (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val); if !done then inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity); zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero); if inf1 && inf2 && sign1 == NOT(sign2) then ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-55 Application Level Programmers’ Model result = FPDefaultNaN(N); FPProcessException(FPExc_InvalidOp, fpscr_val); elsif (inf1 && sign1 == ‘0’) || (inf2 && sign2 == ‘0’) then result = FPInfinity(‘0’, N); elsif (inf1 && sign1 == ‘1’) || (inf2 && sign2 == ‘1’) then result = FPInfinity(‘1’, N); elsif zero1 && zero2 && sign1 == sign2 then result = FPZero(sign1, N); else result_value = value1 + value2; if result_value == 0.0 then // Sign of exact zero result depends on rounding mode result_sign = if fpscr_val<23:22> == ‘10’ then ‘1’ else ‘0’; result = FPZero(result_sign, N); else result = FPRound(result_value, N, fpscr_val); return result; // FPSub() // ======= bits(N) FPSub(bits(N) op1, bits(N) op2, boolean fpscr_controlled) assert N == 32 || N == 64; fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type1,sign1,value1) = FPUnpack(op1, fpscr_val); (type2,sign2,value2) = FPUnpack(op2, fpscr_val); (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val); if !done then inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity); zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero); if inf1 && inf2 && sign1 == sign2 then result = FPDefaultNaN(N); FPProcessException(FPExc_InvalidOp, fpscr_val); elsif (inf1 && sign1 == ‘0’) || (inf2 && sign2 == ‘1’) then result = FPInfinity(‘0’, N); elsif (inf1 && sign1 == ‘1’) || (inf2 && sign2 == ‘0’) then result = FPInfinity(‘1’, N); elsif zero1 && zero2 && sign1 == NOT(sign2) then result = FPZero(sign1, N); else result_value = value1 - value2; if result_value == 0.0 then // Sign of exact zero result depends on rounding mode result_sign = if fpscr_val<23:22> == ‘10’ then ‘1’ else ‘0’; result = FPZero(result_sign, N); else result = FPRound(result_value, N, fpscr_val); return result; A2-56 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model Multiplication and division // FPMul() // ======= bits(N) FPMul(bits(N) op1, bits(N) op2, boolean fpscr_controlled) assert N == 32 || N == 64; fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type1,sign1,value1) = FPUnpack(op1, fpscr_val); (type2,sign2,value2) = FPUnpack(op2, fpscr_val); (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val); if !done then inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity); zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero); if (inf1 && zero2) || (zero1 && inf2) then result = FPDefaultNaN(N); FPProcessException(FPExc_InvalidOp, fpscr_val); elsif inf1 || inf2 then result_sign = if sign1 == sign2 then ‘0’ else ‘1’; result = FPInfinity(result_sign, N); elsif zero1 || zero2 then result_sign = if sign1 == sign2 then ‘0’ else ‘1’; result = FPZero(result_sign, N); else result = FPRound(value1*value2, N, fpscr_val); return result; // FPDiv() // ======= bits(N) FPDiv(bits(N) op1, bits(N) op2, boolean fpscr_controlled) assert N == 32 || N == 64; fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type1,sign1,value1) = FPUnpack(op1, fpscr_val); (type2,sign2,value2) = FPUnpack(op2, fpscr_val); (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val); if !done then inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity); zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero); if (inf1 && inf2) || (zero1 && zero2) then result = FPDefaultNaN(N); FPProcessException(FPExc_InvalidOp, fpscr_val); elsif inf1 || zero2 then result_sign = if sign1 == sign2 then ‘0’ else ‘1’; result = FPInfinity(result_sign, N); if !inf1 then FPProcessException(FPExc_DivideByZero); elsif zero1 || inf2 then result_sign = if sign1 == sign2 then ‘0’ else ‘1’; result = FPZero(result_sign, N); else result = FPRound(value1/value2, N, fpscr_val); return result; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-57 Application Level Programmers’ Model Reciprocal estimate and step The Advanced SIMD extension includes instructions that support Newton-Raphson calculation of the reciprocal of a number. The VRECPE instruction produces the initial estimate of the reciprocal. It uses the following pseudocode functions: // FPRecipEstimate() // ================= bits(32) FPRecipEstimate(bits(32) operand) (type,sign,value) = FPUnpack(operand, StandardFPSCRValue()); if type == FPType_SNaN || type == FPType_QNaN then result = FPProcessNaN(type, operand, StandardFPSCRValue()); elsif type = FPType_Infinity then result = FPZero(sign, 32); elsif type = FPType_Zero then result = FPInfinity(sign, 32); FPProcessException(FPExc_DivideByZero, StandardFPSCRValue()); elsif Abs(value) >= 2^126 then // Result underflows to zero of correct sign result = FPZero(sign, 32); FPProcessException(FPExc_Underflow, StandardFPSCRValue());; else // Operand must be normalized, since denormalized numbers are flushed to zero. Scale to a // double-precision value in the range 0.5 <= x < 1.0, and calculate result exponent. // Scaled value has copied sign bit, exponent = 1022 = double-precision biased version of // -1, fraction = original fraction extended with zeros. scaled = operand<31> : ‘01111111110’ : operand<22:0> : Zeros(29); result_exp = 253 - UInt(operand<30:23>); // In range 253-252 = 1 to 253-1 = 252 // Call C function to get reciprocal estimate of scaled value. estimate = recip_estimate(scaled); // Result is double-precision and a multiple of 1/256 in the range 1 to 511/256. Convert // to scaled single-precision result with copied sign bit and high-order fraction bits, // and exponent calculated above. result = estimate<63> : result_exp<7:0> : estimate<51:29>; return result; // UnsignedRecipEstimate() // ======================= bits(32) UnsignedRecipEstimate(bits(32) operand) if operand<31> == ‘0’ then // Operands <= 0x7FFFFFFF produce 0xFFFFFFFF result = Ones(32); else // Generate double-precision value = operand * 2^-32. This has zero sign bit, // exponent = 1022 = double-precision biased version of -1, fraction taken from // operand, excluding its most significant bit. dp_operand = ‘0 01111111110’ : operand<30:0> : Zeros(21); A2-58 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model // Call C function to get reciprocal estimate of scaled value. estimate = recip_estimate(dp_operand); // Result is double-precision and a multiple of 1/256 in the range 1 to 511/256. // Multiply by 2^31 and convert to an unsigned integer - this just involves // concatenating the implicit units bit with the top 31 fraction bits. result = ‘1’ : estimate<51:21>; return result; where recip_estimate() is defined by the following C function: double recip_estimate(double a) { int q, s; double r; q = (int)(a * 512.0); /* a in units of 1/512 rounded down */ r = 1.0 / (((double)q + 0.5) / 512.0); /* reciprocal r */ s = (int)(256.0 * r + 0.5); /* r in units of 1/256 rounded to nearest */ return (double)s / 256.0; } Table A2-9 shows the results where input values are out of range. Table A2-9 VRECPE results for out-of-range inputs Number type Input Vm[i] Result Vd[i] Integer <= 0x7FFFFFFF 0xFFFFFFFF Floating-point NaN Default NaN Floating-point +/– 0 or denormalized number +/– Infinity a Floating-point +/– infinity +/– 0 Floating-point Absolute value >= 2126 +/– 0 a. The Division by Zero exception bit in the FPSCR (FPSCR[1]) is set The Newton-Raphson iteration: xn+1 = xn(2-dxn) converges to (1/d) if x0 is the result of VRECPE applied to d. The VRECPS instruction performs a 2 - op1*op2 calculation and can be used with a multiplication to perform a step of this iteration. The functionality of this instruction is defined by the following pseudocode function: // FPRecipStep() // ============= ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-59 Application Level Programmers’ Model bits(32) FPRecipStep(bits(32) op1, bits(32) op2) (type1,sign1,value1) = FPUnpack(op1, StandardFPSCRValue()); (type2,sign2,value2) = FPUnpack(op2, StandardFPSCRValue()); (done,result) = FPProcessNaNs(type1, type2, op1, op2, StandardFPSCRValue()); if !done then inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity); zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero); if (inf1 && zero2) || (zero1 && inf2) then product = FPZero(‘0’, 32); else product = FPMul(op1, op2, FALSE); result = FPSub(FPTwo(32), product, FALSE); return result; Table A2-10 shows the results where input values are out of range. Table A2-10 VRECPS results for out-of-range inputs Input Vn[i] Any NaN +/– 0.0 or denormalized number +/– infinity Input Vm[i] Result Vd[i] Any NaN +/– infinity +/– 0.0 or denormalized number Default NaN Default NaN 2.0 2.0 Square root // FPSqrt() // ======== bits(N) FPSqrt(bits(N) operand, boolean fpscr_controlled) assert N == 32 || N == 64; fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type,sign,value) = FPUnpack(operand, fpscr_val); if type == FPType_SNaN || type == FPType_QNaN then result = FPProcessNaN(type, operand, fpscr_val); elsif type == FPType_Zero || (type = FPType_Infinity && sign == ‘0’) then result = operand; elsif sign == ‘1’ then result = FPDefaultNaN(N); FPProcessException(FPExc_InvalidOp, fpscr_val); else result = FPRound(Sqrt(value), N, fpscr_val); return result; A2-60 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model Reciprocal square root The Advanced SIMD extension includes instructions that support Newton-Raphson calculation of the reciprocal of the square root of a number. The VRSQRTE instruction produces the initial estimate of the reciprocal of the square root. It uses the following pseudocode functions: // FPRSqrtEstimate() // ================= bits(32) FPRSqrtEstimate(bits(32) operand) (type,sign,value) = FPUnpack(operand, StandardFPSCRValue()); if type == FPType_SNaN || type == FPType_QNaN then result = FPProcessNaN(type, operand, StandardFPSCRValue()); elsif type = FPType_Zero then result = FPInfinity(sign, 32); FPProcessException(FPExc_DivideByZero, StandardFPSCRValue()); elsif sign == ‘1’ then result = FPDefaultNaN(32); FPProcessException(FPExc_InvalidOp, StandardFPSCRValue()); elsif type = FPType_Infinity then result = FPZero(‘0’, 32); else // Operand must be normalized, since denormalized numbers are flushed to zero. Scale to a // double-precision value in the range 0.25 <= x < 1.0, with the evenness or oddness of // the exponent unchanged, and calculate result exponent. Scaled value has copied sign // bit, exponent = 1022 or 1021 = double-precision biased version of -1 or -2, fraction // = original fraction extended with zeros. if operand<23> == ‘0’ then scaled = operand<31> : ‘01111111110’ : operand<22:0> : Zeros(29); else scaled = operand<31> : ‘01111111101’ : operand<22:0> : Zeros(29); result_exp = (380 - UInt(operand<30:23>)) DIV 2; // Call C function to get reciprocal estimate of scaled value. estimate = recip_sqrt_estimate(scaled); // Result is double-precision and a multiple of 1/256 in the range 1 to 511/256. Convert // to scaled single-precision result with copied sign bit and high-order fraction bits, // and exponent calculated above. result = estimate<63> : result_exp<7:0> : estimate<51:29>; return result; // UnsignedRSqrtEstimate() // ======================= bits(32) UnsignedRSqrtEstimate(bits(32) operand) if operand<31:30> == ‘00’ then // Operands <= 0x3FFFFFFF produce 0xFFFFFFFF result = Ones(32); else ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-61 Application Level Programmers’ Model // Generate double-precision value = operand * 2^-32. This has zero sign bit, // exponent = 1022 or 1021 = double-precision biased version of -1 or -2, // fraction taken from operand, excluding its most significant one or two bits. if operand<31> == ‘1’ then dp_operand = ‘0 01111111110’ : operand<30:0> : Zeros(21); else // operand<31:30> == ‘01’ dp_operand = ‘0 01111111101’ : operand<29:0> : Zeros(22); // Call C function to get reciprocal estimate of scaled value. estimate = recip_sqrt_estimate(dp_operand); // Result is double-precision and a multiple of 1/256 in the range 1 to 511/256. // Multiply by 2^31 and convert to an unsigned integer - this just involves // concatenating the implicit units bit with the top 31 fraction bits. result = ‘1’ : estimate<51:21>; return result; where recip_sqrt_estimate() is defined by the following C function: double recip_sqrt_estimate(double a) { int q0, q1, s; double r; if (a < 0.5) /* range 0.25 <= a < 0.5 */ { q0 = (int)(a * 512.0); /* a in units of 1/512 rounded down */ r = 1.0 / sqrt(((double)q0 + 0.5) / 512.0); /* reciprocal root r */ } else /* range 0.5 <= a < 1.0 */ { q1 = (int)(a * 256.0); /* a in units of 1/256 rounded down */ r = 1.0 / sqrt(((double)q1 + 0.5) / 256.0); /* reciprocal root r */ } s = (int)(256.0 * r + 0.5); /* r in units of 1/256 rounded to nearest */ return (double)s / 256.0; } Table A2-11 shows the results where input values are out of range. Table A2-11 VRSQRTE results for out-of-range inputs Number type Input Vm[i] Result Vd[i] Integer Floating-point Floating-point Floating-point Floating-point <= 0x3FFFFFFF NaN, – normalized number, – infinity – 0 or – denormalized number + 0 or + denormalized number + infinity 0xFFFFFFFF Default NaN – infinity a + infinity a +0 A2-62 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model a. The Division by Zero exception bit in the FPSCR (FPSCR[1]) is set. The Newton-Raphson iteration: xn+1 = xn(3-dxn2)/2 converges to (1/√d) if x0 is the result of VRSQRTE applied to d. The VRSQRTS instruction performs a (3 – op1*op2)/2 calculation and can be used with two multiplications to perform a step of this iteration. The functionality of this instruction is defined by the following pseudocode function: // FPRSqrtStep() // ============= bits(32) FPRSqrtStep(bits(32) op1, bits(32) op2) (type1,sign1,value1) = FPUnpack(op1, StandardFPSCRValue()); (type2,sign2,value2) = FPUnpack(op2, StandardFPSCRValue()); (done,result) = FPProcessNaNs(type1, type2, op1, op2, StandardFPSCRValue()); if !done then inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity); zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero); if (inf1 && zero2) || (zero1 && inf2) then product = FPZero(‘0’, 32); else product = FPMul(op1, op2, FALSE); result = FPDiv(FPSub(FPThree(32), product, FALSE), FPTwo(32), FALSE); return result; Table A2-12 shows the results where input values are out of range. Table A2-12 VRSQRTS results for out-of-range inputs Input Vn[i] Any NaN +/– 0.0 or denormalized number +/– infinity Input Vm[i] Any NaN +/– infinity +/– 0.0 or denormalized number Result Vd[i] Default NaN Default NaN 1.5 1.5 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-63 Application Level Programmers’ Model Conversions The following functions perform conversions between half-precision and single-precision floating-point numbers. // FPHalfToSingle() // ================ bits(32) FPHalfToSingle(bits(16) operand, boolean fpscr_controlled) fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type,sign,value) = FPUnpack(operand, fpscr_val); if type == FPType_SNaN || type == FPType_QNaN then if fpscr_val<25> == ‘1’ then // DN bit set result = FPDefaultNaN(32); else result = sign : ‘11111111 1’ : operand<8:0> : Zeros(13); if type == FPType_SNaN then FPProcessException(FPExc_InvalidOp, fpscr_val); elsif type = FPType_Infinity then result = FPInfinity(sign, 32); elsif type = FPType_Zero then result = FPZero(sign, 32); else result = FPRound(value, 32, fpscr_val); // Rounding will be exact return result; // FPSingleToHalf() // ================ bits(16) FPSingleToHalf(bits(32) operand, boolean fpscr_controlled) fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type,sign,value) = FPUnpack(operand, fpscr_val); if type == FPType_SNaN || type == FPType_QNaN then if fpscr_val<26> == ‘1’ then // AH bit set result = FPZero(sign, 16); elsif fpscr_val<25> == ‘1’ then // DN bit set result = FPDefaultNaN(16); else result = sign : ‘11111 1’ : operand<21:13>; if type == FPType_SNaN || fpscr_val<26> == ‘1’ then FPProcessException(FPExc_InvalidOp, fpscr_val); elsif type = FPType_Infinity then if fpscr_val<26> == ‘1’ then // AH bit set result = sign : Ones(15); FPProcessException(FPExc_InvalidOp, fpscr_val); else result = FPInfinity(sign, 16); elsif type = FPType_Zero then result = FPZero(sign, 16); else result = FPRound(value, 16, fpscr_val); return result; A2-64 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model The following functions perform conversions between single-precision and double-precision floating-point numbers. // FPSingleToDouble() // ================== bits(64) FPSingleToDouble(bits(32) operand, boolean fpscr_controlled) fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type,sign,value) = FPUnpack(operand, fpscr_val); if type == FPType_SNaN || type == FPType_QNaN then if fpscr_val<25> == ‘1’ then // DN bit set result = FPDefaultNaN(64); else result = sign : ‘11111111111 1’ : operand<21:0> : Zeros(29); if type == FPType_SNaN then FPProcessException(FPExc_InvalidOp, fpscr_val); elsif type = FPType_Infinity then result = FPInfinity(sign, 64); elsif type = FPType_Zero then result = FPZero(sign, 64); else result = FPRound(value, 64, fpscr_val); // Rounding will be exact return result; // FPDoubleToSingle() // ================== bits(32) FPDoubleToSingle(bits(64) operand, boolean fpscr_controlled) fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); (type,sign,value) = FPUnpack(operand, fpscr_val); if type == FPType_SNaN || type == FPType_QNaN then if fpscr_val<25> == ‘1’ then // DN bit set result = FPDefaultNaN(32); else result = sign : ‘11111111 1’ : operand<50:29>; if type == FPType_SNaN then FPProcessException(FPExc_InvalidOp, fpscr_val); elsif type = FPType_Infinity then result = FPInfinity(sign, 32); elsif type = FPType_Zero then result = FPZero(sign, 32); else result = FPRound(value, 32, fpscr_val); return result; The following functions perform conversions between floating-point numbers and integers or fixed-point numbers: // FPToFixed() // =========== bits(M) FPToFixed(bits(N) operand, integer M, integer fraction_bits, boolean unsigned, boolean round_towards_zero, boolean fpscr_controlled) assert N == 32 || N == 64; fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-65 Application Level Programmers’ Model if round_towards_zero then fpscr_val<23:22> = ‘11’; (type,sign,value) = FPUnpack(operand, fpscr_val); // For NaNs and infinities, FPUnpack() has produced a value that will round to the // required result of the conversion. Also, the value produced for infinities will // cause the conversion to overflow and signal an Invalid Operation floating-point // exception as required. NaNs must also generate such a floating-point exception. if type == FPType_SNaN || type == FPType_QNaN then FPProcessException(FPExc_InvalidOp, fpscr_val); // Scale value by specified number of fraction bits, then start rounding to an integer // and determine the rounding error. value = value * 2^fraction_bits; int_result = RoundDown(value); error = value - int_result; // Apply the specified rounding mode. case fpscr_val<23:22> of when ‘00’ // Round to Nearest (rounding to even if exactly halfway) round_up = (error > 0.5 || (error == 0.5 && int_result<0> == ‘1’)); when ‘01’ // Round towards Plus Infinity round_up = (error != 0.0); when ‘10’ // Round towards Minus Infinity round_up = FALSE; when ‘11’ // Round towards Zero round_up = (error != 0.0 && int_result < 0); if round_up then int_result = int_result + 1; // Bitstring result is the integer result saturated to the destination size, with // saturation indicating overflow of the conversion (signaled as an Invalid // Operation floating-point exception). (result, overflow) = SatQ(int_result, M, unsigned); if overflow then FPProcessException(FPExc_InvalidOp, fpscr_val); elsif error != 0 then FPProcessException(FPExc_Inexact, fpscr_val); return result; // FixedToFP() // =========== bits(N) FixedToFP(bits(M) operand, integer N, integer fraction_bits, boolean unsigned, boolean round_to_nearest, boolean fpscr_controlled) assert N == 32 || N == 64; fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue(); if round_to_nearest then fpscr_val<23:22> = ‘00’; int_operand = if unsigned then UInt(operand) else SInt(operand); real_operand = int_operand / 2^fraction_bits; if real_operand == 0.0 then result = FPZero(‘0’, N); else result = FPRound(real_operand, N, fpscr_val); return result; A2-66 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model A2.8 Polynomial arithmetic over {0,1} The polynomial data type represents a polynomial in x of the form bn–1xn–1 + … + b1x + b0 where bk is bit [k] of the value. The coefficients 0 and 1 are manipulated using the rules of Boolean arithmetic: • 0+0=1+1=0 • 0+1=1+0=1 • 0*0=0*1=1*0=0 • 1 * 1 = 1. That is: • adding two polynomials over {0,1} is the same as a bitwise exclusive OR • multiplying two polynomials over {0,1} is the same as integer multiplication except that partial products are exclusive-ORed instead of being added. A2.8.1 Pseudocode details of polynomial multiplication In pseudocode, polynomial addition is described by the EOR operation on bitstrings. Polynomial multiplication is described by the PolynomialMult() function: // PolynomialMult() // ================ bits(M+N) PolynomialMult(bits(M) op1, bits(N) op2) result = Zeros(M+N); extended_op2 = Zeros(M) : op2; for i=0 to M-1 if op1 == ‘1’ then result = result EOR LSL(extended_op2, i); return result; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-67 Application Level Programmers’ Model A2.9 Coprocessor support Coprocessor space is used to extend the functionality of an ARM processor. There are sixteen coprocessors defined in the coprocessor instruction space. These are commonly known as CP0 to CP15. The following coprocessors are reserved by ARM for specific purposes: • Coprocessor 15 (CP15) provides system control functionality. This includes architecture and feature identification, as well as control, status information and configuration support. The following sections describe CP15: — CP15 registers for a VMSA implementation on page B3-64 — CP15 registers for a PMSA implementation on page B4-22. CP15 also provides performance monitor registers, see Chapter C9 Performance Monitors. • Coprocessor 14 (CP14) supports: — debug, see Chapter C6 Debug Register Interfaces — the execution environment features defined by the architecture, see Execution environment support on page A2-69. • Coprocessor 11 (CP11) supports double-precision floating-point operations. • Coprocessor 10 (CP10) supports single-precision floating-point operations and the control and configuration of both the VFP and the Advanced SIMD architecture extensions. • Coprocessors 8, 9, 12, and 13 are reserved for future use by ARM. Note Any implementation that includes either or both of the Advanced SIMD extension and the VFP extension must enable access to both CP10 and CP11, see Enabling Advanced SIMD and floating-point support on page B1-64. In general, privileged access is required for: • system control through CP15 • debug control and configuration • access to the identification registers • access to any register bits that enable or disable coprocessor features. For details of the exact split between the privileged and unprivileged coprocessor operations see the relevant sections of this manual. All load, store, branch and data operation instructions associated with floating-point, Advanced SIMD and execution environment support can execute unprivileged. Coprocessors 0 to 7 can be used to provide vendor specific features. A2-68 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model A2.10 Execution environment support The Jazelle and ThumbEE states, introduced in ISETSTATE on page A2-15, support execution environments: • The ThumbEE state is more generic, supporting a variant of the Thumb instruction set that minimizes the code size overhead generated by a Just-In-Time (JIT) or Ahead-Of-Time (AOT) compiler. JIT and AOT compilers convert execution environment source code to a native executable. For more information, see Thumb Execution Environment. • The Jazelle state is specific to hardware acceleration of Java bytecodes. For more information, see Jazelle direct bytecode execution support on page A2-73. A2.10.1 Thumb Execution Environment Thumb Execution Environment (ThumbEE) is a variant of the Thumb instruction set designed as a target for dynamically generated code. This is code that is compiled on the device, from a portable bytecode or other intermediate or native representation, either shortly before or during execution. ThumbEE provides support for Just-In-Time (JIT), Dynamic Adaptive Compilation (DAC) and Ahead-Of-Time (AOT) compilers, but cannot interwork freely with the ARM and Thumb instruction sets. ThumbEE is particularly suited to languages that feature managed pointers and array types. ThumbEE executes instructions in the ThumbEE instruction set state. For information about instruction set states see ISETSTATE on page A2-15. See Thumb Execution Environment on page B1-73 for system level information about ThumbEE. ThumbEE instructions In ThumbEE state, the processor executes almost the same instruction set as in Thumb state. However some instructions behave differently, some are removed, and some ThumbEE instructions are added. The key differences are: • additional instructions to change instruction set in both Thumb state and ThumbEE state • new ThumbEE instructions to branch to handlers • null pointer checking on load/store instructions executed in ThumbEE state • an additional instruction in ThumbEE state to check array bounds • some other modifications to load, store, and control flow instructions. For more information about the ThumbEE instructions see Chapter A9 ThumbEE. ThumbEE configuration ThumbEE introduces two new registers: • ThumbEE Configuration Register, TEECR. This contains a single bit, the ThumbEE configuration control bit, XED. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-69 Application Level Programmers’ Model • ThumbEE Handler Base Register. This contains the base address for ThumbEE handlers. A handler is a short, commonly executed, sequence of instructions. It is typically, but not always, associated directly with one or more bytecodes or other intermediate language elements. Changes to these CP14 registers have the same synchronization requirements as changes to the CP15 registers. These are described in: • Changes to CP15 registers and the memory order model on page B3-77 for a VMSA implementation • Changes to CP15 registers and the memory order model on page B4-28 for a PMSA implementation. ThumbEE is an unprivileged, user-level facility, and there are no special provisions for using it securely. For more information, see ThumbEE and the Security Extensions on page B1-73. ThumbEE Configuration Register (TEECR) The ThumbEE Configuration Register (TEECR) controls unprivileged access to the ThumbEE Handler Base Register. The TEECR is: • a CP14 register • a 32-bit register, with access rights that depend on the current privilege: — the result of an unprivileged write to the register is UNDEFINED — unprivileged reads, and privileged reads and writes, are permitted. • when the Security Extensions are implemented, a Common register. The format of the TEECR is: 31 UNK/SBZP 10 XED Bits [31:1] UNK/SBZP. XED, bit [0] Execution Environment Disable bit. Controls unprivileged access to the ThumbEE Handler Base Register: 0 Unprivileged access permitted. 1 Unprivileged access disabled. The reset value of this bit is 0. The effects of a write to this register on ThumbEE configuration are only guaranteed to be visible to subsequent instructions after the execution of an ISB instruction, an exception entry or an exception return. However, a read of this register always returns the value most recently written to the register. To access the TEECR, read or write the CP14 registers with an MRC or MCR instruction with set to 6, set to c0, set to c0, and set to 0. For example: MRC p14, 6, , c0, c0, 0 ; Read ThumbEE Configuration Register MCR p14, 6, , c0, c0, 0 ; Write ThumbEE Configuration Register A2-70 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model ThumbEE Handler Base Register (TEEHBR) The ThumbEE Handler Base Register (TEEHBR) holds the base address for ThumbEE handlers. The TEEHBR is: • a CP14 register • a 32-bit read/write register, with access rights that depend on the current privilege and the value of the TEECR.XED bit: — privileged accesses are always permitted — when TEECR.XED == 0, unprivileged accesses are permitted — when TEECR.XED == 1, the result of an unprivileged access is UNDEFINED. • when the Security Extensions are implemented, a Common register. The format of the TEEHBR is: 31 HandlerBase 210 SBZ HandlerBase, bits [31:2] The address of the ThumbEE Handler_00 implementation. This is the address of the first of the ThumbEE handlers. The reset value of this field is UNKNOWN. bits [1:0] Reserved, SBZ. The effects of a write to this register on ThumbEE handler entry are only guaranteed to be visible to subsequent instructions after the execution of an ISB instruction, an exception entry or an exception return. However, a read of this register always returns the value most recently written to the register. To access the TEEHBR, read or write the CP14 registers with an MRC or MCR instruction with set to 6, set to c1, set to c0, and set to 0. For example: MRC p14, 6, , c1, c0, 0 ; Read ThumbEE Handler Base Register MCR p14, 6, , c1, c0, 0 ; Write ThumbEE Handler Base Register ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-71 Application Level Programmers’ Model Use of HandlerBase ThumbEE handlers are entered by reference to a HandlerBase address, defined by the TEEHBR. See ThumbEE Handler Base Register (TEEHBR) on page A2-71. Table A2-13 shows how the handlers are arranged in relation to the value of HandlerBase: Table A2-13 Access to ThumbEE handlers Offset from HandlerBase Name Value stored -0x0008 -0x0004 +0x0000 +0x0020 ... +(0x0000 + 32n) ... IndexCheck NullCheck Handler_00 Handler_01 ... Handler_ ... Branch to IndexCheck handler Branch to NullCheck handler Implementation of Handler_00 Implementation of Handler_01 ... Implementation of Handler_ Implementation of additional handlers The IndexCheck occurs when a CHKA instruction detects an index out of range. For more information, see CHKA on page A9-15. The NullCheck occurs when any memory access instruction is executed with a value of 0 in the base register. For more information, see Null checking on page A9-3. Note Checks are similar to conditional branches, with the added property that they clear the IT bits when taken. Other handlers are called using explicit handler call instructions. For details see the following sections: • HB, HBL on page A9-16 • HBLP on page A9-17 • HBP on page A9-18. A2-72 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model A2.10.2 Jazelle direct bytecode execution support From ARMv5TEJ, the architecture requires every system to include an implementation of the Jazelle extension. The Jazelle extension provides architectural support for hardware acceleration of bytecode execution by a Java Virtual Machine (JVM). In the simplest implementations of the Jazelle extension, the processor does not accelerate the execution of any bytecodes, and the JVM uses software routines to execute all bytecodes. Such an implementation is called a trivial implementation of the Jazelle extension, and has minimal additional cost compared with not implementing the Jazelle extension at all. An implementation that provides hardware acceleration of bytecode execution is a non-trivial Jazelle implementation. These requirements for the Jazelle extension mean a JVM can be written to both: • function correctly on all processors that include a Jazelle extension implementation • automatically take advantage of the accelerated bytecode execution provided by a processor that includes a non-trivial implementation. Typically, a non-trivial implementation of the Jazelle extension implements a subset of the bytecodes in hardware, choosing bytecodes that: • can have simple hardware implementations • account for a large percentage of bytecode execution time. The required features of a non-trivial implementation are: • provision of the Jazelle state • a new instruction, BXJ, to enter Jazelle state • system support that enables an operating system to regulate the use of the Jazelle extension hardware • system support that enables a JVM to configure the Jazelle extension hardware to its specific needs. The required features of a trivial implementation are: • Normally, the Jazelle instruction set state is never entered. If an incorrect exception return causes entry to the Jazelle instruction set state, the next instruction executed is treated as UNDEFINED. • The BXJ instruction behaves as a BX instruction. • Configuration support that maintains the interface to the Jazelle extension is permanently disabled. For more information about trivial implementations see Trivial implementation of the Jazelle extension on page B1-81. A JVM that has been written to take advantage automatically of hardware-accelerated bytecode execution is known as an Enabled JVM (EJVM). ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-73 Application Level Programmers’ Model Subarchitectures A processor implementation that includes the Jazelle extension expects the general-purpose register values and other resources of the ARM processor to conform to an interface standard defined by the Jazelle implementation when Jazelle state is entered and exited. For example, a specific general-purpose register might be reserved for use as the pointer to the current bytecode. In order for an EJVM and associated debug support to function correctly, it must be written to comply with the interface standard defined by the acceleration hardware at Jazelle state execution entry and exit points. An implementation of the Jazelle extension might define other configuration registers in addition to the architecturally defined ones. The interface standard and any additional configuration registers used to communicate with the Jazelle extension are known collectively as the subarchitecture of the implementation. They are not described in this manual. Only EJVM implementations and debug or similar software can depend on the subarchitecture. All other software must rely only on the architectural definition of the Jazelle extension given in this manual. A particular subarchitecture is identified by reading the JIDR described in Jazelle ID Register (JIDR) on page A2-76. Jazelle state While the processor is in Jazelle state, it executes bytecode programs. A bytecode program is defined as an executable object that comprises one or more class files, or is derived from and functionally equivalent to one or more class files. See Lindholm and Yellin, The Java Virtual Machine Specification 2nd Edition for the definition of class files. While the processor is in Jazelle state, the PC identifies the next JVM bytecode to be executed. A JVM bytecode is a bytecode defined in Lindholm and Yellin, or a functionally equivalent transformed version of a bytecode defined in Lindholm and Yellin. For the Jazelle extension, the functionality of Native methods, as described in Lindholm and Yellin, must be specified using only instructions from the ARM, Thumb, and ThumbEE instruction sets. An implementation of the Jazelle extension must not be documented or promoted as performing any task while it is in Jazelle state other than the acceleration of bytecode programs in accordance with this section and The Java Virtual Machine Specification. Jazelle state entry instruction, BXJ ARMv7 includes an ARM instruction similar to BX. The BXJ instruction has a single register operand that specifies a target instruction set state, ARM state or Thumb state, and branch target address for use if entry to Jazelle state is not available. For more information, see BXJ on page A8-64. Correct entry into Jazelle state involves the EJVM executing the BXJ instruction at a time when both: • the Jazelle extension Control and Configuration registers are initialized correctly, see Application level configuration and control of the Jazelle extension on page A2-75 A2-74 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model • application level registers and any additional configuration registers are initialized as required by the subarchitecture of the implementation. Executing BXJ with Jazelle extension enabled Executing a BXJ instruction when the JMCR.JE bit is 1, see Jazelle Main Configuration Register (JMCR) on page A2-77, causes the Jazelle hardware to do one of the following: • enter Jazelle state and start executing bytecodes directly from a SUBARCHITECTURE DEFINED address • branch to a SUBARCHITECTURE DEFINED handler. Which of these occurs is SUBARCHITECTURE DEFINED. The Jazelle subarchitecture can use Application Level registers (but not System Level registers) to transfer information between the Jazelle extension and the EJVM. There are SUBARCHITECTURE DEFINED restrictions on what Application Level registers must contain when a BXJ instruction is executed, and Application Level registers have SUBARCHITECTURE DEFINED values when Jazelle state execution ends and ARM or Thumb state execution resumes. Jazelle subarchitectures and implementations must not use any unallocated bits in Application Level registers such as the CPSR or FPSCR. All such bits are reserved for future expansion of the ARM architecture. Executing BXJ with Jazelle extension disabled If a BXJ instruction is executed when the JMCR.JE bit is 0, it is executed identically to a BX instruction with the same register operand. This means that BXJ instructions can be executed freely when the JMCR.JE bit is 0. In particular, if an EJVM determines that it is executing on a processor whose Jazelle extension implementation is trivial or uses an incompatible subarchitecture, it can set JE == 0 and execute correctly. In this case it executes without the benefit of any Jazelle hardware acceleration that might be present. Application level configuration and control of the Jazelle extension All registers associated with the Jazelle extension are implemented in coprocessor space as part of coprocessor 14 (CP14). The registers are accessed using the instructions: • MCR, see MCR, MCR2 on page A8-186 • MRC, see MRC, MRC2 on page A8-202. In a non-trivial implementation at least three registers are required. These are described in: • Jazelle ID Register (JIDR) on page A2-76 • Jazelle Main Configuration Register (JMCR) on page A2-77 • Jazelle OS Control Register (JOSCR) on page B1-77. Additional configuration registers might be provided and are SUBARCHITECTURE DEFINED. The following rules apply to all Jazelle extension control and configuration registers: • All configuration registers are accessed by CP14 MRC and MCR instructions with set to 7. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-75 Application Level Programmers’ Model • The values contained in configuration registers are changed only by the execution of MCR instructions. In particular, they are never changed by Jazelle state execution of bytecodes. • The access policy for the required registers is fully defined in their descriptions. With unprivileged operation: — all MCR accesses to the JIDR are UNDEFINED — MRC and MCR accesses that are restricted to privileged modes are UNDEFINED. The access policy of other configuration registers is SUBARCHITECTURE DEFINED. • When the Security Extensions are implemented, the registers are common to the Secure and Non-secure security states. For more information, see Effect of the Security Extensions on the CP15 registers on page B3-71. This section applies to some CP14 registers as well as to the CP15 registers. • When a configuration register is readable, reading the register returns the last value written to it. Reading a readable configuration register has no side effects. When a configuration register is not readable, attempting to read it returns an UNKNOWN value. • When a configuration register can be written, the effect of writing to it must be idempotent. That is, the overall effect of writing the same value more than once must not differ from the effect of writing it once. Changes to these CP14 registers have the same synchronization requirements as changes to the CP15 registers. These are described in: • Changes to CP15 registers and the memory order model on page B3-77 for a VMSA implementation • Changes to CP15 registers and the memory order model on page B4-28 for a PMSA implementation. For more information, see Jazelle state configuration and control on page B1-77. Jazelle ID Register (JIDR) The Jazelle ID Register (JIDR) enables an EJVM to determine the architecture and subarchitecture under which it is running. The JIDR is: • a CP14 register • a 32-bit read-only register • accessible during privileged and unprivileged execution • when the Security Extensions are implemented, a Common register, see Common CP15 registers on page B3-74. A2-76 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model The format of the JIDR is: 31 28 27 20 19 12 11 0 Architecture Implementer Subarchitecture SUBARCHITECTURE DEFINED Architecture, bits [31:28] Architecture code. This uses the same Architecture code that appears in the Main ID register in coprocessor 15, see c0, Main ID Register (MIDR) on page B3-81 (VMSA implementation) or c0, Main ID Register (MIDR) on page B4-32 (PMSA implementation). Implementer, bits [27:20] Implementer code of the designer of the subarchitecture. This uses the same Implementer code that appears in the Main ID register in coprocessor 15, see c0, Main ID Register (MIDR) on page B3-81 (VMSA implementation) or c0, Main ID Register (MIDR) on page B4-32 (PMSA implementation). If the trivial implementation of the Jazelle extension is used, the Implementer code is 0x00. Subarchitecture, bits [19:12] Contain the subarchitecture code. The following subarchitecture code is defined: 0x00 Jazelle v1 subarchitecture, or trivial implementation of Jazelle extension if Implementer code is 0x00. bits [11:0] Contain additional SUBARCHITECTURE DEFINED information. To access the JIDR, read the CP14 registers with an MRC instruction with set to 7, set to c0, set to c0, and set to 0. For example: MRC p14, 7, , c0, c0, 0 ; Read Jazelle ID register Jazelle Main Configuration Register (JMCR) The Jazelle Main Configuration Register (JMCR) controls the Jazelle extension. The JMCR is: • a CP14 register • a 32-bit register, with access rights that depend on the current privilege: — for privileged operations the register is read/write — for unprivileged operations, the register is normally write-only • when the Security Extensions are implemented, a Common register, see Common CP15 registers on page B3-74. For more information about unprivileged access restrictions see Access to Jazelle registers on page A2-78. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-77 Application Level Programmers’ Model The format of the JMCR is: 31 SUBARCHITECTURE DEFINED 10 JE bit [31:1] SUBARCHITECTURE DEFINED information. JE, bit [0] Jazelle Enable bit: 0 Jazelle extension disabled. The BXJ instruction does not cause Jazelle state execution. BXJ behaves exactly as a BX instruction, see Jazelle state entry instruction, BXJ on page A2-74. 1 Jazelle extension enabled. The reset value of this bit is 0. To access the JMCR, read or write the CP14 registers with an MRC or MCR instruction with set to 7, set to c2, set to c0, and set to 0. For example: MRC p14, 7, , c2, c0, 0 ; Read Jazelle Main Configuration register MCR p14, 7, , c2, c0, 0 ; Write Jazelle Main Configuration register Access to Jazelle registers Table A2-14 shows the access permissions for the Jazelle registers, and how unprivileged access to the registers depends on the value of the JOSCR. Table A2-14 Access to Jazelle registers Jazelle register Unprivileged access JOSCR.CD == 0a JOSCR.CD == 1a JIDR Read access permitted Write access ignored Read and write access UNDEFINED JMCR Read access UNDEFINED Write access permitted Read and write access UNDEFINED SUBARCHITECTURE DEFINED configuration registers Read access UNDEFINED Write access permitted Read and write access UNDEFINED a. See Jazelle OS Control Register (JOSCR) on page B1-77. Privileged access Read access permitted Write access ignored Read and write access permitted Read access SUBARCHITECTURE DEFINED Write access permitted A2-78 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model EJVM operation The following subsections summarize how an EJVM must operate, to meet the requirements of the architecture: • Initialization • Bytecode execution • Jazelle exception conditions • Other considerations on page A2-80. Initialization During initialization, the EJVM must first check which subarchitecture is present, by checking the Implementer and Subarchitecture codes in the value read from the JIDR. If the EJVM is incompatible with the subarchitecture, it must do one of the following: • write a value with JE == 0 to the JMCR • if unaccelerated bytecode execution is unacceptable, generate an error. If the EJVM is compatible with the subarchitecture, it must write its required configuration to the JMCR and any SUBARCHITECTURE DEFINED configuration registers. Bytecode execution The EJVM must contain a handler for each bytecode. The EJVM initiates bytecode execution by executing a BXJ instruction with: • the register operand specifying the target address of the bytecode handler for the first bytecode of the program • the Application Level registers set up in accordance with the SUBARCHITECTURE DEFINED interface standard. The bytecode handler: • performs the data-processing operations required by the bytecode indicated • determines the address of the next bytecode to be executed • determines the address of the handler for that bytecode • performs a BXJ to that handler address with the registers again set up to the SUBARCHITECTURE DEFINED interface standard. Jazelle exception conditions During bytecode execution, the EJVM might encounter SUBARCHITECTURE DEFINED Jazelle exception conditions that must be resolved by a software handler. For example, in the case of a configuration invalid handler, the handler rewrites the desired configuration to the JMCR and to any SUBARCHITECTURE DEFINED configuration registers. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-79 Application Level Programmers’ Model On entry to a Jazelle exception condition handler the contents of the Application Level registers are SUBARCHITECTURE DEFINED. This interface to the Jazelle exception condition handler might differ from the interface standard for the bytecode handler, in order to supply information about the Jazelle exception condition. The Jazelle exception condition handler: • resolves the Jazelle exception condition • determines the address of the next bytecode to be executed • determines the address of the handler for that bytecode • performs a BXJ to that handler address with the registers again set up to the SUBARCHITECTURE DEFINED interface standard. Other considerations To ensure application execution and correct interaction with an operating system, an EJVM must only perform operations that are permitted in unprivileged operation. In particular, for register accesses they must only: • read the JIDR, • write to the JMCR, and other configuration registers. An EJVM must not attempt to access the JOSCR. A2-80 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Programmers’ Model A2.11 Exceptions, debug events and checks ARMv7 uses the following terms to describe various types of exceptional condition: Exceptions In the ARM architecture, exceptions cause entry into a privileged mode and execution of a software handler for the exception. Note The terms floating-point exception and Jazelle exception condition do not use this meaning of exception. These terms are described later in this list. Exceptions include: • reset • interrupts • memory system aborts • undefined instructions • supervisor calls (SVCs). Most details of exception handling are not visible to application-level code, and are described in Exceptions on page B1-30. Aspects that are visible to application-level code are: • The SVC instruction causes an SVC exception. This provides a mechanism for unprivileged code to make a call to the operating system (or other privileged component of the software system). • If the Security Extensions are implemented, the SMC instruction causes an SMC exception, but only if it is executed in a privileged mode. Unprivileged code can only cause SMC exceptions to occur by methods defined by the operating system (or other privileged component of the software system). • The WFI instruction provides a hint that nothing needs to be done until an interrupt or similar exception is taken, see Wait For Interrupt on page B1-47. This permits the processor to enter a low-power state until that happens. • The WFE instruction provides a hint that nothing needs to be done until either an event is generated by an SEV instruction or an interrupt or similar exception is taken, see Wait For Event and Send Event on page B1-44. This permits the processor to enter a low-power state until one of these happens. • The YIELD instruction provides a hint that the current execution thread is of low importance, see The Yield instruction on page A2-82. Floating-point exceptions These relate to exceptional conditions encountered during floating-point arithmetic, such as division by zero or overflow. For more information see: • Floating-point exceptions on page A2-42 • Floating-point Status and Control Register (FPSCR) on page A2-28 • ANSI/IEEE Std. 754-1985, IEEE Standard for Binary Floating-Point Arithmetic. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A2-81 Application Level Programmers’ Model Jazelle exception conditions These are conditions that cause Jazelle hardware acceleration to exit into a software handler, as described in Jazelle exception conditions on page A2-79. Debug events These are conditions that cause a debug system to take action. Most aspects of debug events are not visible to application-level code, and are described in Chapter C3 Debug Events. Aspects that are visible to application-level code include: • The BKPT instruction causes a BKPT Instruction debug event to occur, see BKPT Instruction debug events on page C3-20. • The DBG instruction provides a hint to the debug system. Checks These are provided in the ThumbEE extension. A check causes an unconditional branch to a specific handler entry point. The base address of the ThumbEE check handlers is held in the TEEHBR, see ThumbEE Handler Base Register (TEEHBR) on page A2-71. A2.11.1 The Yield instruction In a Symmetric Multi-Threading (SMT) design, a thread can use a Yield instruction to give a hint to the processor that it is running on. The Yield hint indicates that whatever the thread is currently doing is of low importance, and so could yield. For example, the thread might be sitting in a spin-lock. Similar behavior might be used to modify the arbitration priority of the snoop bus in a multiprocessor (MP) system. Defining such an instruction permits binary compatibility between SMT and SMP systems. ARMv7 defines a YIELD instruction as a specific NOP-hint instruction, see YIELD on page A8-812. The YIELD instruction has no effect in a single-threaded system, but developers of such systems can use the instruction to flag its intended use on migration to a multiprocessor or multithreading system. Operating systems can use YIELD in places where a yield hint is wanted, knowing that it will be treated as a NOP if there is no implementation benefit. A2-82 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter A3 Application Level Memory Model This chapter gives an application level view of the memory model. It contains the following sections: • Address space on page A3-2 • Alignment support on page A3-4 • Endian support on page A3-7 • Synchronization and semaphores on page A3-12 • Memory types and attributes and the memory order model on page A3-24 • Access rights on page A3-38 • Virtual and physical addressing on page A3-40 • Memory access order on page A3-41 • Caches and memory hierarchy on page A3-51. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-1 Application Level Memory Model A3.1 Address space The ARM architecture uses a single, flat address space of 232 8-bit bytes. Byte addresses are treated as unsigned numbers, running from 0 to 232 - 1. The address space is also regarded as: • 230 32-bit words: — the address of each word is word-aligned, meaning that the address is divisible by 4 and the last two bits of the address are 0b00 — the word at word-aligned address A consists of the four bytes with addresses A, A+1, A+2 and A+3. • 231 16-bit halfwords: — the address of each halfword is halfword-aligned, meaning that the address is divisible by 2 and the last bit of the address is 0 — the halfword at halfword-aligned address A consists of the two bytes with addresses A and A+1. In some situations the ARM architecture supports accesses to halfwords and words that are not aligned to the appropriate access size, see Alignment support on page A3-4. Normally, address calculations are performed using ordinary integer instructions. This means that the address wraps around if the calculation overflows or underflows the address space. Another way of describing this is that any address calculation is reduced modulo 232. A3.1.1 Address incrementing and address space overflow When a processor performs normal sequential execution of instructions, it effectively calculates: (address_of_current_instruction) + (size_of_executed_instruction) after each instruction to determine which instruction to execute next. Note The size of the executed instruction depends on the current instruction set, and might depend on the instruction executed. If this address calculation overflows the top of the address space, the result is UNPREDICTABLE. In other words, a program must not rely on sequential execution of the instruction at address 0x00000000 after the instruction at address: • 0xFFFFFFFC, when a 4-byte instruction is executed • 0xFFFFFFFE, when a 2-byte instruction is executed • 0xFFFFFFFF, when a single byte instruction is executed. A3-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model This UNPREDICTABLE behavior only applies to instructions that are executed, including those that fail their condition code check. Most ARM implementations prefetch instructions ahead of the currently-executing instruction. If this prefetching overflows the top of the address space, it does not cause UNPREDICTABLE behavior unless a prefetched instruction with an overflowed address is actually executed. LDC, LDM, LDRD, POP, PUSH, STC, STRD, and STM instructions access a sequence of words at increasing memory addresses, effectively incrementing the memory address by 4 for each load or store. If this calculation overflows the top of the address space, the result is UNPREDICTABLE. In other words, programs must not use these instructions in such a way that they attempt to access the word at address 0x00000000 sequentially after the word at address 0xFFFFFFFC. Note In some cases instructions that operate on multiple words can decrement the memory address by 4 after each word access. If this calculation underflows the address space, by decrementing the address 0x00000000, the result is UNPREDICTABLE. The behavior of any unaligned load or store with a calculated address that would access the byte at 0xFFFFFFFF and the byte at address 0x00000000 as part of the instruction is UNPREDICTABLE. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-3 Application Level Memory Model A3.2 Alignment support Instructions in the ARM architecture are aligned as follows: • ARM instructions are word-aligned • Thumb and ThumbEE instructions are halfword-aligned • Java bytecodes are byte-aligned. The data alignment behavior supported by the ARM architecture has changed significantly between ARMv4 and ARMv7. This behavior is indicated by the SCTLR.U bit, see: • c1, System Control Register (SCTLR) on page B3-96 for a VMSAv7 implementation • c1, System Control Register (SCTLR) on page B4-45 for a PMSAv7 implementation • c1, System Control Register (SCTLR) on page AppxG-34 for architecture versions before ARMv7. This bit defines the alignment behavior of the memory system for data accesses. Table A3-1 shows the values of SCTLR.U for the different architecture versions. Table A3-1 SCTLR.U bit values for different architecture versions Architecture version SCTLR.U value Before ARMv6 ARMv6 ARMv7 0 0 or 1 1 On an ARMv6 processor, the SCTLR.U bit indicates which of two possible alignment models is selected: U == 0 The processor implements the legacy alignment model. This is described in Alignment on page AppxG-6. Note The use of U == 0 is deprecated in ARMv6T2, and is obsolete from ARMv7. U == 1 The processor implements the alignment model described in this section. This model supports unaligned data accesses. ARMv7 requires the processor to implement the alignment model described in this section. A3-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model A3.2.1 Unaligned data access An ARMv7 implementation must support unaligned data accesses. The SCTLR.U bit is RAO to indicate this support. The SCTLR.A bit, the strict alignment bit, controls whether strict alignment is required. The checking of load and store alignment depends on the value of this bit. For more information, see c1, System Control Register (SCTLR) on page B3-96 for a VMSA implementation, or c1, System Control Register (SCTLR) on page B4-45 for a PMSA implementation. Table A3-2 shows how the checking of load and store alignment depends on the instruction type and the value of SCTLR.A. Table A3-2 Alignment requirements of load/store instructions Instructions Alignment check Result if check fails when: SCTLR.A == 0 SCTLR.A == 1 LDRB, LDREXB, LDRBT, LDRSB, LDRSBT, STRB, STREXB, STRBT, None - - SWPB, TBB LDRH, LDRHT, LDRSH, LDRSHT, STRH, STRHT, TBH Halfword Unaligned access Alignment fault LDREXH, STREXH Halfword Alignment fault Alignment fault LDR, LDRT, STR, STRT Word Unaligned access Alignment fault LDREX, STREX Word Alignment fault Alignment fault LDREXD, STREXD Doubleword Alignment fault Alignment fault All forms of LDM, LDRD, PUSH, POP, RFE, SRS, all forms of Word STM, STRD, SWP Alignment fault Alignment fault LDC, LDC2, STC, STC2 Word Alignment fault Alignment fault VLDM, VLDR, VSTM, VSTR Word Alignment fault Alignment fault VLD1, VLD2, VLD3, VLD4, VST1, VST2, VST3, VST4, all with standard alignmenta Element size Unaligned access Alignment fault VLD1, VLD2, VLD3, VLD4, VST1, VST2, VST3, VST4, all with @ specifieda As specified by Alignment fault @ Alignment fault a. These element and structure load/store instructions are only in the Advanced SIMD extension to the ARMv7 ARM and Thumb instruction sets. ARMv7 does not support the pre-ARMv6 alignment model, so you cannot use that model with these instructions. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-5 Application Level Memory Model A3.2.2 Cases where unaligned accesses are UNPREDICTABLE The following cases cause the resulting unaligned accesses to be UNPREDICTABLE, and overrule any successful load or store behavior described in Unaligned data access on page A3-5: • Any load instruction that is not faulted by the alignment restrictions and that loads the PC has UNPREDICTABLE behavior if it the address it loads from is not word-aligned. • Any unaligned access that is not faulted by the alignment restrictions and that accesses memory with the Strongly-ordered or Device attribute has UNPREDICTABLE behavior. Note These memory attributes are described in Memory types and attributes and the memory order model on page A3-24. A3.2.3 Unaligned data access restrictions in ARMv7 and ARMv6 ARMv7 and ARMv6 have the following restrictions on unaligned data accesses: • Accesses are not guaranteed to be single-copy atomic, see Atomicity in the ARM architecture on page A3-26. An access can be synthesized out of a series of aligned operations in a shared memory system without guaranteeing locked transaction cycles. • Unaligned accesses typically take a number of additional cycles to complete compared to a naturally aligned transfer. The real-time implications must be analyzed carefully and key data structures might need to have their alignment adjusted for optimum performance. • If an unaligned access occurs across a page boundary, the operation can abort on either or both halves of the access. Shared memory schemes must not rely on seeing monotonic updates of non-aligned data of loads and stores for data items larger than byte wide. For more information, see Atomicity in the ARM architecture on page A3-26. Unaligned access operations must not be used for accessing Device memory-mapped registers. They must only be used with care in shared memory structures that are protected by aligned semaphores or synchronization variables. A3-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model A3.3 Endian support The rules in Address space on page A3-2 require that for a word-aligned address A: • the word at address A consists of the bytes at addresses A, A+1, A+2 and A+3 • the halfword at address A consists of the bytes at addresses A and A+1 • the halfword at address A+2 consists of the bytes at addresses A+2 and A+3. • the word at address A therefore consists of the halfwords at addresses A and A+2. However, this does not specify completely the mappings between words, halfwords, and bytes. A memory system uses one of the two following mapping schemes. This choice is known as the endianness of the memory system. In a little-endian memory system: • the byte or halfword at a word-aligned address is the least significant byte or halfword in the word at that address • the byte at a halfword-aligned address is the least significant byte in the halfword at that address. In a big-endian memory system: • the byte or halfword at a word-aligned address is the most significant byte or halfword in the word at that address • the byte at a halfword-aligned address is the most significant byte in the halfword at that address. For a word-aligned address A, Table A3-3 and Table A3-4 on page A3-8 show the relationship between: • the word at address A • the halfwords at addresses A and A+2 • the bytes at addresses A, A+1, A+2 and A+3. Table A3-3 shows this relationship for a big-endian memory system, and Table A3-4 on page A3-8 shows the relationship for a little-endian memory system. Table A3-3 Big-endian memory system MSByte MSByte - 1 LSByte + 1 LSByte Word at Address A Halfword at Address A Halfword at Address A+2 Byte at Address A Byte at Address A+1 Byte at Address A+2 Byte at Address A+3 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-7 Application Level Memory Model Table A3-4 Little-endian memory system MSByte MSByte - 1 LSByte + 1 LSByte Word at Address A Halfword at Address A+2 Halfword at Address A Byte at Address A+3 Byte at Address A+2 Byte at Address A+1 Byte at Address A The big-endian and little-endian mapping schemes determine the order in which the bytes of a word or halfword are interpreted. For example, a load of a word (4 bytes) from address 0x1000 always results in an access of the bytes at memory locations 0x1000, 0x1001, 0x1002, and 0x1003. The endianness mapping scheme determines the significance of these four bytes. A3.3.1 Control of the endianness mapping scheme in ARMv7 In ARMv7-A, the mapping of instruction memory is always little-endian. In ARMv7-R, instruction endianness can be controlled at the system level, see Instruction endianness. For information about data memory endianness control, see ENDIANSTATE on page A2-19. Note Versions of the ARM architecture before ARMv7 had a different mechanism to control the endianness, see Endian configuration and control on page AppxG-20. A3.3.2 Instruction endianness Before ARMv7, the ARM architecture included legacy support for an alternative big-endian memory model, described as BE-32 and controlled by the B bit, bit [7], of the SCTLR, see c1, System Control Register (SCTLR) on page AppxG-34. ARMv7 does not support BE-32 operation, and bit [7] of the SCTLR is RAZ. Where legacy object code for ARM processors contains instructions with a big-endian byte order, the removal of support for BE-32 operation requires the instructions in the object files to have their bytes reversed for the code to be executed on an ARMv7 processor. This means that: • each Thumb instruction, whether a 32-bit Thumb instruction or a 16-bit Thumb instruction, must have the byte order of each halfword of instruction reversed • each ARM instruction must have the byte order of each word of instruction reversed. For most situations, this can be handled in the link stage of a tool-flow, provided the object files include sufficient information to permit this to happen. In practice, this is the situation for all applications with the ARMv7-A profile. A3-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model For applications of the ARMv7-R profile, there are some legacy code situations where the arrangement of the bytes in the object files cannot be adjusted by the linker. For these object files to be used by an ARMv7-R processor the byte order of the instructions must be reversed by the processor at runtime. Therefore, the ARMv7-R profile permits configuration of the instruction endianness. Instruction endianness static configuration, ARMv7-R only To provide support for legacy big-endian object code, the ARMv7-R profile supports optional byte order reversal hardware as a static option from reset. The ARMv7-R profile includes a read-only bit in the CP15 Control Register, SCTLR.IE, bit [31]. For more information, see c1, System Control Register (SCTLR) on page B4-45. A3.3.3 Element size and endianness The effect of the endianness mapping on data transfers depends on the size of the data element or elements transferred by the load/store instructions. Table A3-5 lists the element sizes of all the load/store instructions, for all instruction sets. Table A3-5 Element size of load/store instructions Instructions Element size LDRB, LDREXB, LDRBT, LDRSB, LDRSBT, STRB, STREXB, STRBT, SWPB, TBB LDRH, LDREXH, LDRHT, LDRSH, LDRSHT, STRH, STREXH, STRHT, TBH LDR, LDRT, LDREX, STR, STRT, STREX LDRD, LDREXD, STRD, STREXD All forms of LDM, PUSH, POP, RFE, SRS, all forms of STM, SWP LDC, LDC2, STC, STC2, VLDM, VLDR, VSTM, VSTR VLD1, VLD2, VLD3, VLD4, VST1, VST2, VST3, VST4 Byte Halfword Word Word Word Word Element size of the Advanced SIMD access A3.3.4 Instructions to reverse bytes in a general-purpose register An application or device driver might have to interface to memory-mapped peripheral registers or shared memory structures that are not the same endianness as the internal data structures. Similarly, the endianness of the operating system might not match that of the peripheral registers or shared memory. In these cases, the processor requires an efficient method to transform explicitly the endianness of the data. In ARMv7, the ARM and Thumb instruction sets provide this functionality. There are instructions to: • Reverse word (four bytes) register, for transforming big and little-endian 32-bit representations. See REV on page A8-272. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-9 Application Level Memory Model • Reverse halfword and sign-extend, for transforming signed 16-bit representations. See REVSH on page A8-276. • Reverse packed halfwords in a register for transforming big- and little-endian 16-bit representations. See REV16 on page A8-274. A3.3.5 Endianness in Advanced SIMD Advanced SIMD element load/store instructions transfer vectors of elements between memory and the Advanced SIMD register bank. An instruction specifies both the length of the transfer and the size of the data elements being transferred. This information is used by the processor to load and store data correctly in both big-endian and little-endian systems. Consider. for example, the instruction: VLD1.16 {D0}, [R1] This loads a 64-bit register with four 16-bit values. The four elements appear in the register in array order, with the lowest indexed element fetched from the lowest address. The order of bytes in the elements depends on the endianness configuration, as shown in Figure A3-1. Therefore, the order of the elements in the registers is the same regardless of the endianness configuration. This means that Advanced SIMD code is usually independent of endianness. 64-bit register containing four 16-bit elements D[15:8] D[7:0] C[15:8] C[7:0] B[15:8] B[7:0] A[15:8] A[7:0] 0 A[7:0] 1 A[15:8] 2 B[7:0] 3 B[15:8] 4 C[7:0] 5 C[15:8] 6 D[7:0] 7 D[15:8] VLD1.16 {D0}, [R1] Memory system with Little endian addressing (LE) VLD1.16 {D0}, [R1] 0 A[15:8] 1 A[7:0] 2 B[15:8] 3 B[7:0] 4 C[15:8] 5 C[7:0] 6 D[15:8] 7 D[7:0] Memory system with Big endian addressing (BE) Figure A3-1 Advanced SIMD byte order example The Advanced SIMD extension supports Little-Endian (LE) and Big-Endian (BE) models. For information about the alignment of Advanced SIMD instructions see Unaligned data access on page A3-5. A3-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model Note Advanced SIMD is an extension to the ARMv7 ARM and Thumb instruction sets. In ARMv7, the SCTLR.B bit always has the value 0, indicating that ARMv7 does not support the legacy BE-32 endianness model, and you cannot use this model with Advanced SIMD element and structure load/store instructions. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-11 Application Level Memory Model A3.4 Synchronization and semaphores In architecture versions before ARMv6, support for the synchronization of shared memory depends on the SWP and SWPB instructions. These are read-locked-write operations that swap register contents with memory, and are described in SWP, SWPB on page A8-432. These instructions support basic busy/free semaphore mechanisms, but do not support mechanisms that require calculation to be performed on the semaphore between the read and write phases. ARMv6 introduced a new mechanism to support more comprehensive non-blocking synchronization of shared memory, using synchronization primitives that scale for multiprocessor system designs. ARMv6 provided a pair of synchronization primitives, LDREX and STREX. ARMv7 extends the new model by: • adding byte, halfword and doubleword versions of the synchronization primitives • adding a Clear-Exclusive instruction, CLREX • adding the synchronization primitives to the Thumb instruction set. Note From ARMv6, use of the SWP and SWPB instructions is deprecated. ARM strongly recommends that all software migrates to using the new synchronization primitives described in this section. In ARMv7, the synchronization primitives provided in the ARM and Thumb instruction sets are: • Load-Exclusives: — LDREX, see LDREX on page A8-142 — LDREXB, see LDREXB on page A8-144 — LDREXD, see LDREXD on page A8-146 — LDREXH, see LDREXH on page A8-148 • Store-Exclusives: — STREX, see STREX on page A8-400 — STREXB, see STREXB on page A8-402 — STREXD, see STREXD on page A8-404 — STREXH, see STREXH on page A8-406 • Clear-Exclusive, CLREX, see CLREX on page A8-70. Note This section describes the operation of a Load-Exclusive/Store-Exclusive pair of synchronization primitives using, as examples, the LDREX and STREX instructions. The same description applies to any other pair of synchronization primitives: • LDREXB used with STREXB • LDREXD used with STREXD • LDREXH used with STREXH. Each Load-Exclusive instruction must be used only with the corresponding Store-Exclusive instruction. A3-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model The model for the use of a Load-Exclusive/Store-Exclusive instruction pair, accessing a non-aborting memory address x is: • The Load-Exclusive instruction reads a value from memory address x. • The corresponding Store-Exclusive instruction succeeds in writing back to memory address x only if no other observer, process, or thread has performed a more recent store of address x. The Store-Exclusive operation returns a status bit that indicates whether the memory write succeeded. A Load-Exclusive instruction tags a small block of memory for exclusive access. The size of the tagged block is IMPLEMENTATION DEFINED, see Tagging and the size of the tagged memory block on page A3-20. A Store-Exclusive instruction to the same address clears the tag. Note In this section, the term processor includes any observer that can generate a Load-Exclusive or a Store-Exclusive. A3.4.1 Exclusive access instructions and Non-shareable memory regions For memory regions that do not have the Shareable attribute, the exclusive access instructions rely on a local monitor that tags any address from which the processor executes a Load-Exclusive. Any non-aborted attempt by the same processor to use a Store-Exclusive to modify any address is guaranteed to clear the tag. A Load-Exclusive performs a load from memory, and: • the executing processor tags the physical memory address for exclusive access • the local monitor of the executing processor transitions to its Exclusive Access state. A Store-Exclusive performs a conditional store to memory, that depends on the state of the local monitor: If the local monitor is in its Exclusive Access state • If the address of the Store-Exclusive is the same as the address that has been tagged in the monitor by an earlier Load-Exclusive, then the store takes place, otherwise it is IMPLEMENTATION DEFINED whether the store takes place. • A status value is returned to a register: — if the store took place the status value is 0 — otherwise, the status value is 1. • The local monitor of the executing processor transitions to its Open Access state. If the local monitor is in its Open Access state • no store takes place • a status value of 1 is returned to a register. • the local monitor remains in its Open Access state. The Store-Exclusive instruction defines the register to which the status value is returned. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-13 Application Level Memory Model When a processor writes using any instruction other than a Store-Exclusive: • if the write is to a physical address that is not covered by its local monitor the write does not affect the state of the local monitor • if the write is to a physical address that is covered by its local monitor it is IMPLEMENTATION DEFINED whether the write affects the state of the local monitor. If the local monitor is in its Exclusive Access state and the processor performs a Store-Exclusive to any address other than the last one from which it performed a Load-Exclusive, it is IMPLEMENTATION DEFINED whether the store updates memory, but in all cases the local monitor is reset to its Open Access state. This mechanism: • is used on a context switch, see Context switch support on page A3-21 • must be treated as a software programming error in all other cases. Note It is UNPREDICTABLE whether a store to a tagged physical address causes a tag in the local monitor to be cleared if that store is by an observer other than the one that caused the physical address to be tagged. Figure A3-2 shows the state machine for the local monitor. Table A3-6 on page A3-15 shows the effect of each of the operations shown in the figure. LoadExcl(x) LoadExcl(x) Open Access Exclusive Access CLREX StoreExcl(x) Store(x) CLREX Store(Tagged_address) * StoreExcl(Tagged_address) StoreExcl(!Tagged_address) Store(!Tagged_address) Store(Tagged_address) * Operations marked * are possible alternative IMPLEMENTATION DEFINED options. In the diagram: LoadExcl represents any Load-Exclusive instruction StoreExcl represents any Store-Exclusive instruction Store represents any other store instruction. Any LoadExcl operation updates the tagged address to the most significant bits of the address x used for the operation. For more information see the section Size of the tagged memory block. Figure A3-2 Local monitor state machine diagram A3-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model Note For the local monitor state machine, as shown in Figure A3-2 on page A3-14: • The IMPLEMENTATION DEFINED options for the local monitor are consistent with the local monitor being constructed so that it does not hold any physical address, but instead treats any access as matching the address of the previous LoadExcl. • A local monitor implementation can be unaware of Load-Exclusive and Store-Exclusive operations from other processors. • It is UNPREDICTABLE whether the transition from Exclusive Access to Open Access state occurs when the Store or StoreExcl is from another observer. Table A3-6 shows the effect of the operations shown in Figure A3-2 on page A3-14. Table A3-6 Effect of Exclusive instructions and write operations on local monitor Initial state Operationa Effect Final state Open Access CLREX No effect Open Access Open Access StoreExcl(x) Does not update memory, returns status 1 Open Access Open Access LoadExcl(x) Loads value from memory, tags address x Exclusive Access Open Access Store(x) Updates memory, no effect on monitor Open Access Exclusive Access CLREX Clears tagged address Open Access Exclusive Access StoreExcl(t) Updates memory, returns status 0 Open Access Exclusive Access StoreExcl(!t) Updates memory, returns status 0b Does not update memory, returns status 1b Open Access Exclusive Access LoadExcl(x) Loads value from memory, changes tag to address to x Exclusive Access Exclusive Access Store(!t) Updates memory, no effect on monitor Exclusive Access Exclusive Access Store(t) Updates memory Exclusive Accessb Open Accessb a. In the table: LoadExcl represents any Load-Exclusive instruction StoreExcl represents any Store-Exclusive instruction Store represents any store operation other than a Store-Exclusive operation. t is the tagged address, bits [31:a] of the address of the last Load-Exclusive instruction. For more information, see Tagging and the size of the tagged memory block on page A3-20. b. IMPLEMENTATION DEFINED alternative actions. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-15 Application Level Memory Model A3.4.2 Exclusive access instructions and Shareable memory regions For memory regions that have the Shareable attribute, exclusive access instructions rely on: • A local monitor for each processor in the system, that tags any address from which the processor executes a Load-Exclusive. The local monitor operates as described in Exclusive access instructions and Non-shareable memory regions on page A3-13, except that for Shareable memory any Store-Exclusive is then subject to checking by the global monitor if it is described in that section as doing at least one of: — updating memory — returning a status value of 0. The local monitor can ignore exclusive accesses from other processors in the system. • A global monitor that tags a physical address as exclusive access for a particular processor. This tag is used later to determine whether a Store-Exclusive to that address that has not been failed by the local monitor can occur. Any successful write to the tagged address by any other observer in the shareability domain of the memory location is guaranteed to clear the tag. For each processor in the system, the global monitor: — holds a single tagged address — maintains a state machine. The global monitor can either reside in a processor block or exist as a secondary monitor at the memory interfaces. Note An implementation can combine the functionality of the global and local monitors into a single unit. Operation of the global monitor Load-Exclusive from Shareable memory performs a load from memory, and causes the physical address of the access to be tagged as exclusive access for the requesting processor. This access also causes the exclusive access tag to be removed from any other physical address that has been tagged by the requesting processor. The global monitor only supports a single outstanding exclusive access to Shareable memory per processor. Store-Exclusive performs a conditional store to memory: • The store is guaranteed to succeed only if the physical address accessed is tagged as exclusive access for the requesting processor and both the local monitor and the global monitor state machines for the requesting processor are in the Exclusive Access state. In this case: — a status value of 0 is returned to a register to acknowledge the successful store — the final state of the global monitor state machine for the requesting processor is IMPLEMENTATION DEFINED — if the address accessed is tagged for exclusive access in the global monitor state machine for any other processor then that state machine transitions to Open Access state. A3-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model • If no address is tagged as exclusive access for the requesting processor, the store does not succeed: — a status value of 1 is returned to a register to indicate that the store failed — the global monitor is not affected and remains in Open Access state for the requesting processor. • If a different physical address is tagged as exclusive access for the requesting processor, it is IMPLEMENTATION DEFINED whether the store succeeds or not: — if the store succeeds a status value of 0 is returned to a register, otherwise a value of 1 is returned — if the global monitor state machine for the processor was in the Exclusive Access state before the Store-Exclusive it is IMPLEMENTATION DEFINED whether that state machine transitions to the Open Access state. The Store-Exclusive instruction defines the register to which the status value is returned. In a shared memory system, the global monitor implements a separate state machine for each processor in the system. The state machine for accesses to Shareable memory by processor (n) can respond to all the Shareable memory accesses visible to it. This means it responds to: • accesses generated by the associated processor (n) • accesses generated by the other observers in the shareability domain of the memory location (!n). In a shared memory system, the global monitor implements a separate state machine for each observer that can generate a Load-Exclusive or a Store-Exclusive in the system. Figure A3-3 on page A3-18 shows the state machine for processor(n) in a global monitor. Table A3-7 on page A3-19 shows the effect of each of the operations shown in the figure. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-17 Application Level Memory Model Open Access LoadExcl(x,n) LoadExcl(x,n) Exclusive Access CLREX(n), CLREX(!n), LoadExcl(x,!n), StoreExcl(x,n), StoreExcl(x,!n), Store(x,n), Store(x,!n) StoreExcl(Tagged_address,!n)‡ Store(Tagged_address,!n) StoreExcl(Tagged_address,n) * StoreExcl(!Tagged_address,n) * Store(Tagged_address,n) * CLREX(n) * StoreExcl(Tagged_address,!n)‡ Store(!Tagged_address,n) StoreExcl(Tagged_address,n) * StoreExcl(!Tagged_address,n) * Store(Tagged_address,n) * CLREX(n) * StoreExcl(!Tagged_address,!n) Store(!Tagged_address,!n) CLREX(!n) ‡ StoreExcl(Tagged_Address,!n) clears the monitor only if the StoreExcl updates memory Operations marked * are possible alternative IMPLEMENTATION DEFINED options. In the diagram: LoadExcl represents any Load-Exclusive instruction StoreExcl represents any Store-Exclusive instruction Store represents any other store instruction. Any LoadExcl operation updates the tagged address to the most significant bits of the address x used for the operation. For more information see the section Size of the tagged memory block. Figure A3-3 Global monitor state machine diagram for processor(n) in a multiprocessor system Note For the global monitor state machine, as shown in Figure A3-3: • Whether a Store-Exclusive successfully updates memory or not depends on whether the address accessed matches the tagged Shareable memory address for the processor issuing the Store-Exclusive instruction. For this reason, Figure A3-3 and Table A3-7 on page A3-19 only show how the (!n) entries cause state transitions of the state machine for processor(n). • An Load-Exclusive can only update the tagged Shareable memory address for the processor issuing the Load-Exclusive instruction. • The effect of the CLREX instruction on the global monitor is IMPLEMENTATION DEFINED. • It is IMPLEMENTATION DEFINED: — whether a modification to a non-shareable memory location can cause a global monitor to transition from Exclusive Access to Open Access state — whether a Load-Exclusive to a non-shareable memory location can cause a global monitor to transition from Open Access to Exclusive Access state. A3-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model Table A3-7 shows the effect of the operations shown in Figure A3-3 on page A3-18. Table A3-7 Effect of load/store operations on global monitor for processor(n) Initial state a Operation b Effect Final state a Open CLREX(n), CLREX(!n) StoreExcl(x,n) LoadExcl(x,!n) StoreExcl(x,!n) Store(x,n), Store(x,!n) LoadExcl(x,n) Exclusive LoadExcl(x,n) None Open Does not update memory, returns status 1 Open Loads value from memory, no effect on tag address for processor(n) Open Depends on state machine and tag address for processor issuing STREX c Open Updates memory, no effect on monitor Open Loads value from memory, tags address x Loads value from memory, tags address x Exclusive Exclusive CLREX(n) None. Effect on the final state is IMPLEMENTATION DEFINED. CLREX(!n) StoreExcl(t,!n) None Updates memory, returns status 0c Does not update memory, returns status 1c StoreExcl(t,n) Updates memory, returns status 0d StoreExcl(!t,n) Updates memory, returns status 0e Does not update memory, returns status 1e StoreExcl(!t,!n) Depends on state machine and tag address for processor issuing STREX Exclusive e Open e Exclusive Open Exclusive Open Exclusive Open Exclusive Open Exclusive Exclusive ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-19 Application Level Memory Model Table A3-7 Effect of load/store operations on global monitor for processor(n) (continued) Initial state a Operation b Effect Final state a Exclusive Store(t,n) Updates memory Exclusive e Open e Store(t,!n) Updates memory Open Store(!t,n), Store(!t,!n) Updates memory, no effect on monitor Exclusive a. Open = Open Access state, Exclusive = Exclusive Access state. b. In the table: LoadExcl represents any Load-Exclusive instruction StoreExcl represents any Store-Exclusive instruction Store represents any store operation other than a Store-Exclusive operation. t is the tagged address for processor(n), bits [31:a] of the address of the last Load-Exclusive instruction issued by processor(n), see Tagging and the size of the tagged memory block. c. The result of a STREX(x,!n) or a STREX(t,!n) operation depends on the state machine and tagged address for the processor issuing the STREX instruction. This table shows how each possible outcome affects the state machine for processor(n). d. After a successful STREX to the tagged address, the state of the state machine is IMPLEMENTATION DEFINED. However, this state has no effect on the subsequent operation of the global monitor. e. Effect is IMPLEMENTATION DEFINED. The table shows all permitted implementations. A3.4.3 Tagging and the size of the tagged memory block As stated in the footnotes to Table A3-6 on page A3-15 and Table A3-7 on page A3-19, when a Load-Exclusive instruction is executed, the resulting tag address ignores the least significant bits of the memory address. Tagged_address = Memory_address[31:a] The value of a in this assignment is IMPLEMENTATION DEFINED, between a minimum value of 3 and a maximum value of 11. For example, in an implementation where a == 4, a successful LDREX of address 0x000341B4 gives a tag value of bits [31:4] of the address, giving 0x000341B. This means that the four words of memory from 0x0003 41B0 to 0x000341BF are tagged for exclusive access. The size of the tagged memory block called the Exclusives Reservation Granule. The Exclusives Reservation Granule is IMPLEMENTATION DEFINED between: • two words, in an implementation with a == 3 • 512 words, in an implementation with a == 11. In some implementations the CTR identifies the Exclusives Reservation Granule, see: • c0, Cache Type Register (CTR) on page B3-83 for a VMSA implementation • c0, Cache Type Register (CTR) on page B4-34 for a PMSA implementation. A3-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model A3.4.4 Context switch support After a context switch, software must ensure that the local monitor is in the Open Access state. This requires it to either: • execute a CLREX instruction • execute a dummy STREX to a memory address allocated for this purpose. Note • Using a dummy STREX for this purpose is backwards-compatible with the ARMv6 implementation of the exclusive operations. The CLREX instruction is introduced in ARMv6K. • Context switching is not an application level operation. However, this information is included here to complete the description of the exclusive operations. The STREX or CLREX instruction following a context switch might cause a subsequent Store-Exclusive to fail, requiring a load … store sequence to be replayed. To minimize the possibility of this happening, ARM recommends that the Store-Exclusive instruction is kept as close as possible to the associated Load-Exclusive instruction, see Load-Exclusive and Store-Exclusive usage restrictions. A3.4.5 Load-Exclusive and Store-Exclusive usage restrictions The Load-Exclusive and Store-Exclusive instructions are intended to work together, as a pair, for example a LDREX/STREX pair or a LDREXB/STREXB pair. As mentioned in Context switch support, ARM recommends that the Store-Exclusive instruction always follows within a few instructions of its associated Load-Exclusive instructions. To support different implementations of these functions, software must follow the notes and restrictions given here. These notes describe use of an LDREX/STREX pair, but apply equally to any other Load-Exclusive/Store-Exclusive pair: • The exclusives support a single outstanding exclusive access for each processor thread that is executed. The architecture makes use of this by not requiring an address or size check as part of the IsExclusiveLocal() function. If the target address of an STREX is different from the preceding LDREX in the same execution thread, behavior can be UNPREDICTABLE. As a result, an LDREX/STREX pair can only be relied upon to eventually succeed if they are executed with the same address. Where a context switch or exception might result in a change of execution thread, a CLREX instruction or a dummy STREX instruction must be executed to avoid unwanted effects, as described in Context switch support Using an STREX in this way is the only occasion where software can program an STREX with a different address from the previously executed LDREX. • An explicit store to memory can cause the clearing of exclusive monitors associated with other processors, therefore, performing a store between the LDREX and the STREX can result in a livelock situation. As a result, code must avoid placing an explicit store between an LDREX and an STREX in a single code sequence. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-21 Application Level Memory Model • If two STREX instructions are executed without an intervening LDREX the second STREX returns a status value of 1. This means that: — every STREX must have a preceding LDREX associated with it in a given thread of execution — it is not necessary for every LDREX to have a subsequent STREX. • An implementation of the Load-Exclusive and Store-Exclusive instructions can require that, in any thread of execution, the transaction size of a Store-Exclusive is the same as the transaction size of the preceding Load-Exclusive that was executed in that thread. If the transaction size of a Store-Exclusive is different from the preceding Load-Exclusive in the same execution thread, behavior can be UNPREDICTABLE. As a result, an LDREX/STREX pair can only be relied upon to eventually succeed only if they have the same size. Where a context switch or exception might result in a change of execution thread, the software must execute a CLREX instruction or a dummy STREX instruction to avoid unwanted effects, as described in Context switch support on page A3-21. Using an STREX in this way is the only occasion where software can use a Store-Exclusive instruction with a different transaction size from the previously executed Load-Exclusive instruction. • An implementation might clear an exclusive monitor between the LDREX and the STREX, without any application-related cause. For example, this might happen because of cache evictions. Code written for such an implementation must avoid having any explicit memory accesses or cache maintenance operations between the LDREX and STREX instructions. • Implementations can benefit from keeping the LDREX and STREX operations close together in a single code sequence. This minimizes the likelihood of the exclusive monitor state being cleared between the LDREX instruction and the STREX instruction. Therefore, ARM strongly recommends a limit of 128 bytes between LDREX and STREX instructions in a single code sequence, for best performance. • Implementations that implement coherent protocols, or have only a single master, might combine the local and global monitors for a given processor. The IMPLEMENTATION DEFINED and UNPREDICTABLE parts of the definitions in Exclusive monitors operations on page B2-35 are provided to cover this behavior. • The architecture sets an upper limit of 2048 bytes on the size of a region that can be marked as exclusive. Therefore, for performance reasons, ARM recommends that software separates objects that will be accessed by exclusive accesses by at least 2048 bytes. This is a performance guideline rather than a functional requirement. • LDREX and STREX operations must be performed only on memory with the Normal memory attribute. • The effect of Data Abort exceptions on the state of monitors is UNPREDICTABLE. ARM recommends that abort handling code performs a CLREX instruction or a dummy STREX instruction to clear the monitor state. • If the memory attributes for the memory being accessed by an LDREX/STREX pair are changed between the LDREX and the STREX, behavior is UNPREDICTABLE. A3-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model A3.4.6 Semaphores The Swap (SWP) and Swap Byte (SWPB) instructions must be used with care to ensure that expected behavior is observed. Two examples are as follows: 1. A system with multiple bus masters that uses Swap instructions to implement semaphores that control interactions between different bus masters. In this case, the semaphores must be placed in an uncached region of memory, where any buffering of writes occurs at a point common to all bus masters using the mechanism. The Swap instruction then causes a locked read-write bus transaction. 2. A systems with multiple threads running on a uniprocessor that uses the Swap instructions to implement semaphores that control interaction of the threads. In this case, the semaphores can be placed in a cached region of memory, and a locked read-write bus transaction might or might not occur. The Swap and Swap Byte instructions are likely to have better performance on such a system than they do on a system with multiple bus masters such as that described in example 1. Note From ARMv6, use of the Swap and Swap Byte instructions is deprecated. All new software should use the Load-Exclusive and Store-Exclusive synchronization primitives described in Synchronization and semaphores on page A3-12, for example LDREX and STREX. A3.4.7 Synchronization primitives and the memory order model The synchronization primitives follow the memory order model of the memory type accessed by the instructions. For this reason: • Portable code for claiming a spin-lock must include a Data Memory Barrier (DMB) operation, performed by a DMB instruction, between claiming the spin-lock and making any access that makes use of the spin-lock. • Portable code for releasing a spin-lock must include a DMB instruction before writing to clear the spin-lock. This requirement applies to code using: • the Load-Exclusive/Store-Exclusive instruction pairs, for example LDREX/STREX • the deprecated synchronization primitives, SWP/SWPB. A3.4.8 Use of WFE and SEV instructions by spin-locks ARMv7 and ARMv6K provide Wait For Event and Send Event instructions, WFE and SEV, that can assist with reducing power consumption and bus contention caused by processors repeatedly attempting to obtain a spin-lock. These instructions can be used at application level, but a complete understanding of what they do depends on system-level understanding of exceptions. They are described in Wait For Event and Send Event on page B1-44. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-23 Application Level Memory Model A3.5 Memory types and attributes and the memory order model ARMv6 defined a set of memory attributes with the characteristics required to support the memory and devices in the system memory map. In ARMv7 this set of attributes is extended by the addition of the Outer Shareable attribute for Normal memory. Note Whether an ARMv7 implementation supports the Outer Shareable memory attribute is IMPLEMENTATION DEFINED. The ordering of accesses for regions of memory, referred to as the memory order model, is defined by the memory attributes. This model is described in the following sections: • Memory types • Summary of ARMv7 memory attributes on page A3-25 • Atomicity in the ARM architecture on page A3-26 • Normal memory on page A3-28 • Device memory on page A3-33 • Strongly-ordered memory on page A3-34 • Memory access restrictions on page A3-35 • Backwards compatibility on page A3-37 • The effect of the Security Extensions on page A3-37. A3.5.1 Memory types For each memory region, the most significant memory attribute specifies the memory type. There are three mutually exclusive memory types: • Normal • Device • Strongly-ordered. Normal and Device memory regions have additional attributes. Usually, memory used for program code and for data storage is Normal memory. Examples of Normal memory technologies are: • programmed Flash ROM Note During programming, Flash memory can be ordered more strictly than Normal memory. • ROM • SRAM • DRAM and DDR memory. A3-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model System peripherals (I/O) generally conform to different access rules to Normal memory. Examples of I/O accesses are: • FIFOs where consecutive accesses — add queued values on write accesses — remove queued values on read accesses. • interrupt controller registers where an access can be used as an interrupt acknowledge, changing the state of the controller itself • memory controller configuration registers that are used to set up the timing and correctness of areas of Normal memory • memory-mapped peripherals, where accessing a memory location can cause side effects in the system. In ARMv7, regions of the memory map for these accesses are defined as Device or Strongly-ordered memory. To ensure system correctness, access rules for Device and Strongly-ordered memory are more restrictive than those for Normal memory: • both read and write accesses can have side effects • accesses must not be repeated, for example, on return from an exception • the number, order and sizes of the accesses must be maintained. In addition, for Strongly-ordered memory, all memory accesses are strictly ordered to correspond to the program order of the memory access instructions. A3.5.2 Summary of ARMv7 memory attributes Table A3-8 summarizes the memory attributes. For more information about theses attributes see: • Normal memory on page A3-28 and Shareable attribute for Device memory regions on page A3-34, for the shareability attribute • Write-Through Cacheable, Write-Back Cacheable and Non-cacheable Normal memory on page A3-32, for the cacheability attribute. Table A3-8 Memory attribute summary Memory type attribute Shareability Other attributes Description Strongly- - - ordered All memory accesses to Strongly-ordered memory occur in program order. All Strongly-ordered regions are assumed to be Shareable. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-25 Application Level Memory Model Table A3-8 Memory attribute summary (continued) Memory type attribute Shareability Other attributes Description Device Shareable - Intended to handle memorymapped peripherals that are shared by several processors. Non- - shareable Intended to handle memorymapped peripherals that are used only by a single processor. Normal Outer Shareable Cacheability, one of: a Non-cacheable Write-Through Cacheable Write-Back Write-Allocate Cacheable Write-Back no Write-Allocate Cacheable The Outer Shareable attribute qualifies the Shareable attribute for Normal memory regions and enables two levels of Normal memory sharing.b Inner Shareable Cacheability, one of: a Non-cacheable Write-Through Cacheable Write-Back Write-Allocate Cacheable Write-Back no Write-Allocate Cacheable Intended to handle Normal memory that is shared between several processors. Nonshareable Cacheability, one of: a Non-cacheable Write-Through Cacheable Write-Back Write-Allocate Cacheable Write-Back no Write-Allocate Cacheable Intended to handle Normal memory that is used by only a single processor. a. The cacheability attribute is defined independently for inner and outer cache regions. b. The significance of the Outer Shareable attribute is IMPLEMENTATION DEFINED. A3.5.3 Atomicity in the ARM architecture Atomicity is a feature of memory accesses, described as atomic accesses. The ARM architecture description refers to two types of atomicity, defined in: • Single-copy atomicity on page A3-27 • Multi-copy atomicity on page A3-28. A3-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model Single-copy atomicity A read or write operation is single-copy atomic if the following conditions are both true: • After any number of write operations to an operand, the value of the operand is the value written by one of the write operations. It is impossible for part of the value of the operand to come from one write operation and another part of the value to come from a different write operation. • When a read operation and a write operation are made to the same operand, the value obtained by the read operation is one of: — the value of the operand before the write operation — the value of the operand after the write operation. It is never the case that the value of the read operation is partly the value of the operand before the write operation and partly the value of the operand after the write operation. In ARMv7, the single-copy atomic processor accesses are: • all byte accesses • all halfword accesses to halfword-aligned locations • all word accesses to word-aligned locations • memory accesses caused by LDREXD and STREXD instructions to doubleword-aligned locations. LDM, LDC, LDC2, LDRD, STM, STC, STC2, STRD, PUSH, POP, RFE, SRS, VLDM, VLDR, VSTM, and VSTR instructions are executed as a sequence of word-aligned word accesses. Each 32-bit word access is guaranteed to be single-copy atomic. A subsequence of two or more word accesses from the sequence might not exhibit single-copy atomicity. Advanced SIMD element and structure loads and stores are executed as a sequence of accesses of the element or structure size. The element accesses are single-copy atomic if and only if both: • the element size is 32 bits, or smaller • the elements are naturally aligned. Accesses to 64-bit elements or structures that are at least word-aligned are executed as a sequence of 32-bit accesses, each of which is single-copy atomic. Subsequences of two or more 32-bit accesses from the sequence might not be single-copy atomic. When an access is not single-copy atomic, it is executed as a sequence of smaller accesses, each of which is single-copy atomic, at least at the byte level. If an instruction is executed as a sequence of accesses according to these rules, some exceptions can be taken in the sequence and cause execution of the instruction to be abandoned. These exceptions are: • synchronous Data Abort exceptions • if low interrupt latency configuration is selected and the accesses are to Normal memory, see Low interrupt latency configuration on page B1-43: — IRQ interrupts — FIQ interrupts — asynchronous aborts. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-27 Application Level Memory Model If any of these exceptions are returned from using their preferred exception return, the instruction that generated the sequence of accesses is re-executed and so any accesses that had already been performed before the exception was taken are repeated. Note The exception behavior for these multiple access instructions means they are not suitable for use for writes to memory for the purpose of software synchronization. For implicit accesses: • Cache linefills and evictions have no effect on the single-copy atomicity of explicit transactions or instruction fetches. • Instruction fetches are single-copy atomic for each instruction fetched. Note 32-bit Thumb instructions are fetched as two 16-bit items. • Translation table walks are performed as 32-bit accesses aligned to 32 bits, each of which is single-copy atomic. Multi-copy atomicity In a multiprocessing system, writes to a memory location are multi-copy atomic if the following conditions are both true: • All writes to the same location are serialized, meaning they are observed in the same order by all observers, although some observers might not observe all of the writes. • A read of a location does not return the value of a write until all observers observe that write. Writes to Normal memory are not multi-copy atomic. All writes to Device and Strongly-ordered memory that are single-copy atomic are also multi-copy atomic. All write accesses to the same location are serialized. Write accesses to Normal memory can be repeated up to the point that another write to the same address is observed. For Normal memory, serialization of writes does not prohibit the merging of writes. A3.5.4 Normal memory Normal memory is idempotent, meaning that it exhibits the following properties: • read accesses can be repeated with no side effects • repeated read accesses return the last value written to the resource being read • read accesses can prefetch additional memory locations with no side effects A3-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model • write accesses can be repeated with no side effects, provided that the contents of the location are unchanged between the repeated writes • unaligned accesses can be supported • accesses can be merged before accessing the target memory system. Normal memory can be read/write or read-only, and a Normal memory region is defined as being either Shareable or Non-shareable. In a VMSA implementation, Shareable Normal memory can be either Inner Shareable or Outer Shareable. In a PMSA implementation, no distinction is made between Inner Shareable and Outer Shareable regions. The Normal memory type attribute applies to most memory used in a system. Accesses to Normal Memory have a weakly consistent model of memory ordering. See a standard text describing memory ordering issues for a description of weakly consistent memory models, for example chapter 2 of Memory Consistency Models for Shared Memory-Multiprocessors, Kourosh Gharachorloo, Stanford University Technical Report CSL-TR-95-685. In general, for Normal memory, barrier operations are required where the order of memory accesses observed by other observers must be controlled. This requirement applies regardless of the cacheablility and shareability attributes of the Normal memory region. The ordering requirements of accesses described in Ordering requirements for memory accesses on page A3-45 apply to all explicit accesses. An instruction that generates a sequence of accesses as described in Atomicity in the ARM architecture on page A3-26 might be abandoned as a result of an exception being taken during the sequence of accesses. On return from the exception the instruction is restarted, and therefore one or more of the memory locations might be accessed multiple times. This can result in repeated write accesses to a location that has been changed between the write accesses. The architecture permits speculative accesses to memory locations marked as Normal if the access permissions and domain permit an access to the locations. A Normal memory region has shareability attributes that define the data coherency properties of the region. These attributes do not affect the coherency requirements of: • instruction fetches, see Instruction coherency issues on page A3-53 • translation table walks, if supported, in the base ARMv7 architecture and in versions of the architecture before ARMv7, see TLB maintenance operations and the memory order model on page B3-59. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-29 Application Level Memory Model Non-shareable Normal memory For a Normal memory region, the Non-shareable attribute identifies Normal memory that is likely to be accessed only by a single processor. A region of Normal memory with the Non-shareable attribute does not have any requirement to make data accesses by different observers coherent, unless the memory is non-cacheable. If other observers share the memory system, software must use cache maintenance operations if the presence of caches might lead to coherency issues when communicating between the observers. This cache maintenance requirement is in addition to the barrier operations that are required to ensure memory ordering. For Non-shareable Normal memory, the Load-Exclusive and Store-Exclusive synchronization primitives do not take account of the possibility of accesses by more than one observer. Shareable, Inner Shareable, and Outer Shareable Normal memory For Normal memory, the Shareable and Outer Shareable memory attributes describe Normal memory that is expected to be accessed by multiple processors or other system masters: • In a VMSA implementation, Normal memory that has the Shareable attribute but not the Outer Shareable attribute assigned is described as having the Inner Shareable attribute. • In a PMSA implementation, no distinction is made between Inner Shareable and Outer Shareable Normal memory, and you cannot assign the Outer Shareable attribute to Normal memory regions. A region of Normal memory with the Shareable attribute is one for which data accesses to memory by different observers within the same shareability domain are coherent. The Outer Shareable attribute is introduced in ARMv7, and can be applied only to a Normal memory region in a VMSA implementation that has the Shareable attribute assigned. It creates three levels of shareability for a Normal memory region: Non-shareable A Normal memory region that does not have the Shareable attribute assigned. Inner Shareable A Normal memory region that has the Shareable attribute assigned, but not the Outer Shareable attribute. Outer Shareable A Normal memory region that has both the Shareable and the Outer Shareable attributes assigned. These attributes can be used to define sets of observers for which the shareability attributes make the data or unified caches transparent for data accesses. The sets of observers that are affected by the shareability attributes are described as shareability domains. The details of the use of these attributes are system-specific. Example A3-1 on page A3-31 shows how they might be used: A3-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model Example A3-1 Use of shareability attributes In a VMSA implementation, a particular sub-system with two clusters of processors has the requirement that: • in each cluster, the data or unified caches of the processors in the cluster are transparent for all data accesses with the Inner Shareable attribute • however, between the two clusters, the caches: — are not transparent for data accesses that have only the Inner Shareable attribute — are transparent for data accesses that have the Outer Shareable attribute. In this system, each cluster is in a different shareability domain for the Inner Shareable attribute, but all components of the sub-system are in the same shareability domain for the Outer Shareable attribute. A system might implement two such sub-systems. If the data or unified caches of one subsystem are not transparent to the accesses from the other subsystem, this system has two Outer Shareable shareability domains. Having two levels of shareability attribute means you can reduce the performance and power overhead for shared memory regions that do not need to be part of the Outer Shareable shareability domain. Whether an ARMv7 implementation supports the Outer Shareable attribute is IMPLEMENTATION DEFINED. If the Outer Shareable attribute is supported, its significance in the implementation is IMPLEMENTATION DEFINED. For Shareable Normal memory, the Load-Exclusive and Store-Exclusive synchronization primitives take account of the possibility of accesses by more than one observer in the same Shareability domain. Note The Shareable concept enables system designers to specify the locations in Normal memory that must have coherency requirements. However, to facilitate porting of software, software developers must not assume that specifying a memory region as Non-shareable permits software to make assumptions about the incoherency of memory locations between different processors in a shared memory system. Such assumptions are not portable between different multiprocessing implementations that make use of the Shareable concept. Any multiprocessing implementation might implement caches that, inherently, are shared between different processing elements. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-31 Application Level Memory Model Write-Through Cacheable, Write-Back Cacheable and Non-cacheable Normal memory In addition to being Outer Shareable, Inner Shareable or Non-shareable, each region of Normal memory can be marked as being one of: • Write-Through Cacheable • Write-Back Cacheable, with an additional qualifier that marks it as one of: — Write-Back, Write-Allocate — Write-Back, no Write-Allocate • Non-cacheable. If the same memory locations are marked as having different cacheability attributes, for example by the use of aliases in a virtual to physical address mapping, behavior is UNPREDICTABLE. The cacheability attributes provide a mechanism of coherency control with observers that lie outside the shareability domain of a region of memory. In some cases, the use of Write-Through Cacheable or Non-cacheable regions of memory might provide a better mechanism for controlling coherency than the use of hardware coherency mechanisms or the use of cache maintenance routines. To this end, the architecture requires the following properties for Non-cacheable or Write-Through Cacheable memory: • a completed write to a memory location that is Non-cacheable or Write-Through Cacheable for a level of cache made by an observer accessing the memory system inside the level of cache is visible to all observers accessing the memory system outside the level of cache without the need of explicit cache maintenance • a completed write to a memory location that is Non-cacheable for a level of cache made by an observer accessing the memory system outside the level of cache is visible to all observers accessing the memory system inside the level of cache without the need of explicit cache maintenance. Note Implementations can also use the cacheability attributes to provide a performance hint regarding the performance benefit of caching. For example, it might be known to a programmer that a piece of memory is not going to be accessed again and would be better treated as Non-cacheable. The distinction between Write-Back Write-Allocate and Write-Back no Write-Allocate memory exists only as a hint for performance. The ARM architecture provides independent cacheability attributes for Normal memory for two conceptual levels of cache, the inner and the outer cache. The relationship between these conceptual levels of cache and the implemented physical levels of cache is IMPLEMENTATION DEFINED, and can differ from the boundaries between the Inner and Outer Shareability domains. However: • inner refers to the innermost caches, and always includes the lowest level of cache • no cache controlled by the Inner cacheability attributes can lie outside a cache controlled by the Outer cacheability attributes • an implementation might not have any outer cache. A3-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model Example A3-2 to Example A3-4 describe the three possible ways of implementing a system with three levels of cache, L1 to L3. L1 is the level closest to the processor, see Memory hierarchy on page A3-52. Example A3-2 Implementation with two inner and one outer cache levels Implement the three levels of cache in the system, L1 to L3, with: • the Inner cacheability attribute applied to L1 and L2 cache • the Outer cacheability attribute applied to L3 cache. Example A3-3 Implementation with three inner and no outer cache levels Implement the three levels of cache in the system, L1 to L3, with the Inner cacheability attribute applied to L1, L2, and L3 cache. Do not use the Outer cacheability attribute. Example A3-4 Implementation with one inner and two outer cache levels Implement the three levels of cache in the system, L1 to L3, with: • the Inner cacheability attribute applied to L1 cache • the Outer cacheability attribute applied to L2 and L3 cache. A3.5.5 Device memory The Device memory type attribute defines memory locations where an access to the location can cause side effects, or where the value returned for a load can vary depending on the number of loads performed. Memory-mapped peripherals and I/O locations are examples of memory regions normally marked as being Device memory. For explicit accesses from the processor to memory marked as Device: • all accesses occur at their program size • the number of accesses is the number specified by the program. An implementation must not repeat an access to a Device memory location if the program has only one access to that location. In other words, accesses to Device memory locations are not restartable. The architecture does not permit speculative accesses to memory marked as Device. The architecture permits an Advanced SIMD element or structure load instruction to access bytes in Device memory that are not explicitly accessed by the instruction, provided the bytes accessed are within a 16-byte window, aligned to 16-bytes, that contains at least one byte that is explicitly accessed by the instruction. Address locations marked as Device are never held in a cache. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-33 Application Level Memory Model All explicit accesses to Device memory must comply with the ordering requirements of accesses described in Ordering requirements for memory accesses on page A3-45. An instruction that generates a sequence of accesses as described in Atomicity in the ARM architecture on page A3-26 might be abandoned as a result of an exception being taken during the sequence of accesses. On return from the exception the instruction is restarted, and therefore one or more of the memory locations might be accessed multiple times. This can result in repeated write accesses to a location that has been changed between the write accesses. Note Do not use an instruction that generates a sequence of accesses to access Device memory if the instruction might generate an abort on any access other than the first one. Any unaligned access that is not faulted by the alignment restrictions and accesses Device memory has UNPREDICTABLE behavior. Shareable attribute for Device memory regions Device memory regions can be given the Shareable attribute. This means that a region of Device memory can be described as either: • Shareable Device memory • Non-shareable Device memory. Non-shareable Device memory is defined as only accessible by a single processor. An example of a system supporting Shareable and Non-shareable Device memory is an implementation that supports both: • a local bus for its private peripherals • system peripherals implemented on the main shared system bus. Such a system might have more predictable access times for local peripherals such as watchdog timers or interrupt controllers. In particular, a specific address in a Non-shareable Device memory region might access a different physical peripheral for each processor. A3.5.6 Strongly-ordered memory The Strongly-ordered memory type attribute defines memory locations where an access to the location can cause side effects, or where the value returned for a load can vary depending on the number of loads performed. Examples of memory regions normally marked as being Strongly-ordered are memory-mapped peripherals and I/O locations. For explicit accesses from the processor to memory marked as Strongly-ordered: • all accesses occur at their program size • the number of accesses is the number specified by the program. An implementation must not repeat an access to a Strongly-ordered memory location if the program has only one access to that location. In other words, accesses to Strongly-ordered memory locations are not restartable. A3-34 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model The architecture does not permit speculative accesses to memory marked as Strongly-ordered. The architecture permits an Advanced SIMD element or structure load instruction to access bytes in Strongly-ordered memory that are not explicitly accessed by the instruction, provided the bytes accessed are within a 16-byte window, aligned to 16-bytes, that contains at least one byte that is explicitly accessed by the instruction. Address locations in Strongly-ordered memory are not held in a cache, and are always treated as Shareable memory locations. All explicit accesses to Strongly-ordered memory must correspond to the ordering requirements of accesses described in Ordering requirements for memory accesses on page A3-45. An instruction that generates a sequence of accesses as described in Atomicity in the ARM architecture on page A3-26 might be abandoned as a result of an exception being taken during the sequence of accesses. On return from the exception the instruction is restarted, and therefore one or more of the memory locations might be accessed multiple times. This can result in repeated write accesses to a location that has been changed between the write accesses. Note Do not use an instruction that generates a sequence of accesses to access Strongly-ordered memory if the instruction might generate an abort on any access other than the first one. Any unaligned access that is not faulted by the alignment restrictions and accesses Strongly-ordered memory has UNPREDICTABLE behavior. Note See Ordering of instructions that change the CPSR interrupt masks on page AppxG-8 for additional requirements that apply to accesses to Strongly-ordered memory in ARMv6. A3.5.7 Memory access restrictions The following restrictions apply to memory accesses: • For any access X, the bytes accessed by X must all have the same memory type attribute, otherwise the behavior of the access is UNPREDICTABLE. That is, an unaligned access that spans a boundary between different memory types is UNPREDICTABLE. • For any two memory accesses X and Y that are generated by the same instruction, the bytes accessed by X and Y must all have the same memory type attribute, otherwise the results are UNPREDICTABLE. For example, an LDC, LDM, LDRD, STC, STM, or STRD that spans a boundary between Normal and Device memory is UNPREDICTABLE. • An instruction that generates an unaligned memory access to Device or Strongly-ordered memory is UNPREDICTABLE. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-35 Application Level Memory Model • To ensure access rules are maintained, an instruction that causes multiple accesses to Device or Strongly-ordered memory must not cross a 4KB address boundary, otherwise the effect is UNPREDICTABLE. For this reason, it is important that an access to a volatile memory device is not made using a single instruction that crosses a 4KB address boundary. ARM expects this restriction to impose constraints on the placing of volatile memory devices in the memory map of a system, rather than expecting a compiler to be aware of the alignment of memory accesses. • For instructions that generate accesses to Device or Strongly-ordered memory, implementations must not change the sequence of accesses specified by the pseudocode of the instruction. This includes not changing: — how many accesses there are — the time order of the accesses — the data sizes and other properties of each access. In addition, processor implementations expect any attached memory system to be able to identify the memory type of an accesses, and to obey similar restrictions with regard to the number, time order, data sizes and other properties of the accesses. Exceptions to this rule are: — An implementation of a processor can break this rule, provided that the information it supplies to the memory system enables the original number, time order, and other details of the accesses to be reconstructed. In addition, the implementation must place a requirement on attached memory systems to do this reconstruction when the accesses are to Device or Strongly-ordered memory. For example, an implementation with a 64-bit bus might pair the word loads generated by an LDM into 64-bit accesses. This is because the instruction semantics ensure that the 64-bit access is always a word load from the lower address followed by a word load from the higher address. However the implementation must permit the memory systems to unpack the two word loads when the access is to Device or Strongly-ordered memory. — Any implementation technique that produces results that cannot be observed to be different from those described above is legitimate. — An Advanced SIMD element or structure load instruction can access bytes in Device or Strongly-ordered memory that are not explicitly accessed by the instruction, provided the bytes accessed are within a 16-byte window, aligned to 16-bytes, that contains at least one byte that is explicitly accessed by the instruction. • Any multi-access instruction that loads or stores the PC must access only Normal memory. If the instruction accesses Device or Strongly-ordered memory the result is UNPREDICTABLE. There is one exception to this restriction. In the VMSA architecture, when the MMU is disabled any multi-access instruction that loads or stores the PC functions correctly, see Enabling and disabling the MMU on page B3-5. • Any instruction fetch must access only Normal memory. If it accesses Device or Strongly-ordered memory, the result is UNPREDICTABLE. For example, instruction fetches must not be performed to an area of memory that contains read-sensitive devices, because there is no ordering requirement between instruction fetches and explicit accesses. A3-36 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model • Behavior is UNPREDICTABLE if the same memory location: — is marked as Shareable Normal and Non-shareable Normal — is marked as having different memory types (Normal, Device, or Strongly-ordered) — is marked as having different cacheability attributes — is marked as being Shareable Device and Non-shareable Device memory. Such memory marking contradictions can occur, for example, by the use of aliases in a virtual to physical address mapping. Before ARMv6, it is IMPLEMENTATION DEFINED whether a low interrupt latency mode is supported. From ARMv6, low interrupt latency support is controlled by the SCTLR.FI bit. It is IMPLEMENTATION DEFINED whether multi-access instructions behave correctly in low interrupt latency configurations. A3.5.8 Backwards compatibility From ARMv6, the memory attributes are significantly different from those in previous versions of the architecture. Table A3-9 shows the interpretation of the earlier memory types in the light of this definition. Table A3-9 Backwards compatibility Previous architectures ARMv6 and ARMv7 attribute NCNB (Non-cacheable, Non-bufferable) NCB (Non-cacheable, Bufferable) Write-Through Cacheable, Bufferable Write-Back Cacheable, Bufferable Strongly-ordered Shareable Device Non-shareable Normal, Write-Through Cacheable Non-shareable Normal, Write-Back Cacheable A3.5.9 The effect of the Security Extensions The Security Extensions can be included as part of an ARMv7-A implementation, with a VMSA. They provide two distinct 4GByte virtual memory spaces: • a Secure virtual memory space • a Non-secure virtual memory space. The Secure virtual memory space is accessed by memory accesses in the Secure state, and the Non-secure virtual memory space is accessed by memory accesses in the Non-secure state. By providing different virtual memory spaces, the Security Extensions permit memory accesses made from the Non-secure state to be distinguished from those made from the Secure state. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-37 Application Level Memory Model A3.6 Access rights ARMv7 includes additional attributes for memory regions, that enable: • Data accesses to be restricted, based on the privilege of the access. See Privilege level access controls for data accesses. • Instruction fetches to be restricted, based on the privilege of the process or thread making the fetch. See Privilege level access controls for instruction accesses. • On a system that implements the Security Extensions, accesses to be restricted to memory accesses with the Secure memory attribute. See Memory region security status on page A3-39. A3.6.1 Privilege level access controls for data accesses The memory attributes can define that a memory region is: • not accessible to any accesses • accessible only to Privileged accesses • accessible to Privileged and Unprivileged accesses. The access privilege level is defined separately for explicit read and explicit write accesses. However, a system that defines the memory attributes is not required to support all combinations of memory attributes for read and write accesses. A Privileged access is an access made during privileged execution, as a result of a load or store operation other than LDRT, STRT, LDRBT, STRBT, LDRHT, STRHT, LDRSHT, or LDRSBT. An Unprivileged access is an access made as a result of load or store operation performed in one of these cases: • when the processor is in an unprivileged mode • when the processor is in any mode and the access is made as a result of a LDRT, STRT, LDRBT, STRBT, LDRHT, STRHT, LDRSHT, or LDRSBT instruction. A Data Abort exception is generated if the processor attempts a data access that the access rights do not permit. For example, a Data Abort exception is generated if the processor is in unprivileged mode and attempts to access a memory region that is marked as only accessible to Privileged accesses. A3.6.2 Privilege level access controls for instruction accesses Memory attributes can define that a memory region is: • Not accessible for execution • Accessible for execution by Privileged processes only • Accessible for execution by Privileged and Unprivileged processes. To define the instruction access rights to a memory region, the memory attributes describe, separately, for the region: • its read access rights, see Privilege level access controls for data accesses A3-38 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model • whether it is suitable for execution. For example, a region that is accessible for execution by Privileged processes only has the memory attributes: • accessible only to Privileged read accesses • suitable for execution. This means there is some linkage between the memory attributes that define the accessibility of a region to explicit memory accesses, and those that define that a region can be executed. A memory fault occurs if a processor attempts to execute code from a memory location with attributes that do not permit code execution. A3.6.3 Memory region security status An additional memory attribute determines whether the memory region is Secure or Non-secure in an ARMv7-A system that implements the Security Extensions. When the Security Extensions are implemented, this attribute is checked by the system hardware to ensure that a region of memory that is designated as Secure by the system hardware is not accessed by memory accesses with the Non-secure memory attribute. For more information, see Memory region attributes on page B3-32. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-39 Application Level Memory Model A3.7 Virtual and physical addressing ARMv7 provides three alternative architectural profiles, ARMv7-A, ARMv7-R and ARMv7-M. Each of the profiles specifies a different memory system. This manual describes two of these profiles: ARMv7-A profile The ARMv7-A memory system incorporates a Memory Management Unit (MMU), controlled by CP15 registers. The memory system supports virtual addressing, with the MMU performing virtual to physical address translation, in hardware, as part of program execution. ARMv7-R profile The ARMv7-R memory system incorporates a Memory Protection Unit (MPU), controlled by CP15 registers. The MPU does not support virtual addressing. At the application level, the difference between the ARMv7-A and ARMv7-R memory systems is transparent. Regardless of which profile is implemented, an application accesses the memory map described in Address space on page A3-2, and the implemented memory system makes the features described in this chapter available to the application. For a system-level description of the ARMv7-A and ARMv7-R memory models see: • Chapter B2 Common Memory System Architecture Features • Chapter B3 Virtual Memory System Architecture (VMSA) • Chapter B4 Protected Memory System Architecture (PMSA). Note This manual does not describe the ARMv7-M profile. For details of this profile see: • ARMv7-M Architecture Application Level Reference Manual, for an application-level description • ARMv7-M Architecture Reference Manual, for a full description. A3-40 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model A3.8 Memory access order ARMv7 provides a set of three memory types, Normal, Device, and Strongly-ordered, with well-defined memory access properties. The ARMv7 application-level view of the memory attributes is described in: • Memory types and attributes and the memory order model on page A3-24 • Access rights on page A3-38. When considering memory access ordering, an important feature of the ARMv6 memory model is the Shareable memory attribute, that indicates whether a region of memory can be shared between multiple processors, and therefore requires an appearance of cache transparency in the ordering model. The key issues with the memory order model depend on the target audience: • For software programmers, considering the model at the application level, the key factor is that for accesses to Normal memory barriers are required in some situations where the order of accesses observed by other observers must be controlled. • For silicon implementers, considering the model at the system level, the Strongly-ordered and Device memory attributes place certain restrictions on the system designer in terms of what can be built and when to indicate completion of an access. Note Implementations remain free to choose the mechanisms required to implement the functionality of the memory model. More information about the memory order model is given in the following subsections: • Reads and writes on page A3-42 • Ordering requirements for memory accesses on page A3-45 • Memory barriers on page A3-47. Additional attributes and behaviors relate to the memory system architecture. These features are defined in the system level section of this manual: • Virtual memory systems based on an MMU, described in Chapter B3 Virtual Memory System Architecture (VMSA). • Protected memory systems based on an MPU, described in Chapter B4 Protected Memory System Architecture (PMSA). • Caches, described in Caches on page B2-3. Note In these system level descriptions, some attributes are described in relation to an MMU. In general, these descriptions can also be applied to an MPU based system. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-41 Application Level Memory Model A3.8.1 Reads and writes Each memory access is either a read or a write. Explicit memory accesses are the memory accesses required by the function of an instruction. The following can cause memory accesses that are not explicit: • instruction fetches • cache loads and writebacks • translation table walks. Except where otherwise stated, the memory ordering requirements only apply to explicit memory accesses. Reads Reads are defined as memory operations that have the semantics of a load. The memory accesses of the following instructions are reads: • LDR, LDRB, LDRH, LDRSB, and LDRSH • LDRT, LDRBT, LDRHT, LDRSBT, and LDRSHT • LDREX, LDREXB, LDREXD, and LDREXH • LDM, LDRD, POP, and RFE • LDC, LDC2, VLDM, VLDR, VLD1, VLD2, VLD3, and VLD4 • the return of status values by STREX, STREXB, STREXD, and STREXH • in the ARM instruction set only, SWP and SWPB • in the Thumb instruction set only, TBB and TBH. Hardware-accelerated opcode execution by the Jazelle extension can cause a number of reads to occur, according to the state of the operand stack and the implementation of the Jazelle hardware acceleration. Writes Writes are defined as memory operations that have the semantics of a store. The memory accesses of the following instructions are Writes: • STR, STRB, and STRH • STRT, STRBT, and STRHT • STREX, STREXB, STREXD, and STREXH • STM, STRD, PUSH, and SRS • STC, STC2, VSTM, VSTR, VST1, VST2, VST3, and VST4 • in the ARM instruction set only, SWP and SWPB. Hardware-accelerated opcode execution by the Jazelle extension can cause a number of writes to occur, according to the state of the operand stack and the implementation of the Jazelle hardware acceleration. A3-42 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model Synchronization primitives Synchronization primitives must ensure correct operation of system semaphores in the memory order model. The synchronization primitive instructions are defined as those instructions that are used to ensure memory synchronization: • LDREX, STREX, LDREXB, STREXB, LDREXD, STREXD, LDREXH, STREXH. • SWP, SWPB. Use of these instructions is deprecated from ARMv6. Before ARMv6, support consisted of the SWP and SWPB instructions. ARMv6 introduced new Load-Exclusive and Store-Exclusive instructions LDREX and STREX, and deprecated using the SWP and SWPB instructions. ARMv7 introduces: • additional Load-Exclusive and Store-Exclusive instructions, LDREXB, LDREXD, LDREXH, STREXB, STREXD, and STREXH • the Clear-Exclusive instruction CLREX • the Load-Exclusive, Store-Exclusive and Clear-Exclusive instructions in the Thumb instruction set. For details of the Load-Exclusive, Store-Exclusive and Clear-Exclusive instructions see Synchronization and semaphores on page A3-12. The Load-Exclusive and Store-Exclusive instructions are supported to Shareable and Non-shareable memory. Non-shareable memory can be used to synchronize processes that are running on the same processor. Shareable memory must be used to synchronize processes that might be running on different processors. Observability and completion An observer is an agent in the system that can access memory. For a processor, the following mechanisms must be treated as independent observers: • the mechanism that performs reads or writes to memory • a mechanism that causes an instruction cache to be filled from memory or that fetches instructions to be executed directly from memory • a mechanism that performs translation table walks. The set of observers that can observe a memory access is defined by the system. For all memory: • a write to a location in memory is said to be observed by an observer when a subsequent read of the location by the same observer will return the value written by the write • a write to a location in memory is said to be globally observed for a shareability domain when a subsequent read of the location by any observer in that shareability domain will return the value written by the write ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-43 Application Level Memory Model • a read of a location in memory is said to be observed by an observer when a subsequent write to the location by the same observer will have no effect on the value returned by the read • a read of a location in memory is said to be globally observed for a shareability domain when a subsequent write to the location by any observer in that shareability domain will have no effect on the value returned by the read. Additionally, for Strongly-ordered memory: • A read or write of a memory-mapped location in a peripheral that exhibits side-effects is said to be observed, and globally observed, only when the read or write: — meets the general conditions listed — can begin to affect the state of the memory-mapped peripheral — can trigger all associated side effects, whether they affect other peripheral devices, processors or memory. For all memory, the completion rules are defined as: • A read or write is complete for a shareability domain when all of the following are true: — the read or write is globally observed for that shareability domain — any translation table walks associated with the read or write are complete for that shareability domain. • A translation table walk is complete for a shareability domain when the memory accesses associated with the translation table walk are globally observed for that shareability domain, and the TLB is updated. • A cache, branch predictor or TLB maintenance operation is complete for a shareability domain when the effects of operation are globally observed for that shareability domain and any translation table walks that arise from the operation are complete for that shareability domain. The completion of any cache, branch predictor and TLB maintenance operation includes its completion on all processors that are affected by both the operation and the DSB. Side effect completion in Strongly-ordered and Device memory The completion of a memory access in Strongly-ordered or Device memory is not guaranteed to be sufficient to determine that the side effects of the memory access are visible to all observers. The mechanism that ensures the visibility of side-effects of a memory accesses is IMPLEMENTATION DEFINED. A3-44 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model A3.8.2 Ordering requirements for memory accesses ARMv7 and ARMv6 define access restrictions in the permitted ordering of memory accesses. These restrictions depend on the memory attributes of the accesses involved. Two terms used in describing the memory access ordering requirements are: Address dependency An address dependency exists when the value returned by a read access is used to compute the virtual address of a subsequent read or write access. An address dependency exists even if the value read by the first read access does not change the virtual address of the second read or write access. This might be the case if the value returned is masked off before it is used, or if it has no effect on the predicted address value for the second access. Control dependency A control dependency exists when the data value returned by a read access is used to determine the condition code flags, and the values of the flags are used for condition code checking to determine the address of a subsequent read access. This address determination might be through conditional execution, or through the evaluation of a branch. Figure A3-4 on page A3-46 shows the memory ordering between two explicit accesses A1 and A2, where A1 occurs before A2 in program order. The symbols used in the figure are as follows: < Accesses must be observed in program order, that is, A1 must be observed before A2. - Accesses can be observed in any order, provided that the requirements of uniprocessor semantics, for example respecting dependencies between instructions in a single processor, are maintained. The following additional restrictions apply to the ordering of memory accesses that have this symbol: • If there is an address dependency then the two memory accesses are observed in program order by any observer in the common shareability domain of the two accesses. This ordering restriction does not apply if there is only a control dependency between the two read accesses. If there is both an address dependency and a control dependency between two read accesses the ordering requirements of the address dependency apply. • If the value returned by a read access is used as data written by a subsequent write access, then the two memory accesses are observed in program order. • It is impossible for an observer in the shareability domain of a memory location to observe a write access to that memory location if that location would not be written to in a sequential execution of a program. • It is impossible for an observer in the shareability domain of a memory location to observe a write value written to that memory location if that value would not be written in a sequential execution of a program. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-45 Application Level Memory Model • It is impossible for an observer in the shareability domain of a memory location to observe two reads to the same memory location performed by the same observer in an order that would not occur in a sequential execution of a program. In Figure A3-4, an access refers to a read or a write access to the specified memory type. For example, Device access, Non-shareable refers to a read or write access to Non-shareable Device memory. A2 A1 Normal access Device access, Non-shareable Device access, Shareable Strongly-ordered access Normal access - Device access Non-shareable Shareable - - < - - < < < Stronglyordered access < < < Figure A3-4 Memory ordering restrictions There are no ordering requirements for implicit accesses to any type of memory. Program order for instruction execution The program order of instruction execution is the order of the instructions in the control flow trace. Explicit memory accesses in an execution can be either: Strictly Ordered Denoted by <. Must occur strictly in order. Ordered Denoted by <=. Can occur either in order or simultaneously. Load/store multiple instructions, such as LDM, LDRD, STM, and STRD, generate multiple word accesses, each of which is a separate access for the purpose of determining ordering. The rules for determining program order for two accesses A1 and A2 are: If A1 and A2 are generated by two different instructions: • A1 < A2 if the instruction that generates A1 occurs before the instruction that generates A2 in program order • A2 < A1 if the instruction that generates A2 occurs before the instruction that generates A1 in program order. If A1 and A2 are generated by the same instruction: • If A1 and A2 are the load and store generated by a SWP or SWPB instruction: — A1 < A2 if A1 is the load and A2 is the store — A2 < A1 if A2 is the load and A1 is the store. A3-46 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model • In these descriptions: — an LDM-class instruction is any form of LDM, LDMDA, LDMDB, LDMIB, or POP instruction — an LDC-class instruction is an LDC, VLDM, or VLDR instruction — an STM-class instruction is any form of STM, STMDA, STMDB, STMIB, or PUSH instruction — an STC-class instruction is an STC, VSTM, or VSTR instruction. If A1 and A2 are two word loads generated by an LDC-class or LDM-class instruction, or two word stores generated by an STC-class or STM-class instruction, excluding LDM-class and STM-class instructions with a register list that includes the PC: — A1 <= A2 if the address of A1 is less than the address of A2 — A2 <= A1 if the address of A2 is less than the address of A1. If A1 and A2 are two word loads generated by an LDM-class instruction with a register list that includes the PC or two word stores generated by an STM-class instruction with a register list that includes the PC, the program order of the memory accesses is not defined. • If A1 and A2 are two word loads generated by an LDRD instruction or two word stores generated by an STRD instruction, the program order of the memory accesses is not defined. • If A1 and A2 are load or store accesses generated by Advanced SIMD element or structure load/store instructions, the program order of the memory accesses is not defined. • For any instruction or operation not explicitly mentioned in this section, if the single-copy atomicity rules described in Single-copy atomicity on page A3-27 mean the operation becomes a sequence of accesses, then the time-ordering of those accesses is not defined. A3.8.3 Memory barriers Memory barrier is the general term applied to an instruction, or sequence of instructions, used to force synchronization events by a processor with respect to retiring load/store instructions. The ARM architecture defines a number of memory barriers that provide a range of functionality, including: • ordering of issued load/store instructions to the programmers’ model • completion of preceding load/store instructions to the programmers’ model • flushing of any instructions prefetched before the memory barrier operation. ARMv7 and ARMv6 require three explicit memory barriers to support the memory order model described in this chapter. In ARMv7 the memory barriers are provided as instructions that are available in the ARM and Thumb instruction sets, and in ARMv6 the memory barriers are performed by CP15 register writes. The three memory barriers are: • Data Memory Barrier, see Data Memory Barrier (DMB) on page A3-48 • Data Synchronization Barrier, see Data Synchronization Barrier (DSB) on page A3-49 • Instruction Synchronization Barrier, see Instruction Synchronization Barrier (ISB) on page A3-49. Depending on the synchronization needed, a program might use memory barriers on their own, or it might use them in conjunction with cache and memory management maintenance operations that are only available in privileged modes. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-47 Application Level Memory Model The DMB and DSB memory barriers affect reads and writes to the memory system generated by load/store instructions and data or unified cache maintenance operations being executed by the processor. Instruction fetches or accesses caused by a hardware translation table access are not explicit accesses. Data Memory Barrier (DMB) The DMB instruction is a data memory barrier. The processor that executes the DMB instruction is referred to as the executing processor, Pe. The DMB instruction takes the required shareability domain and required access types as arguments. If the required shareability is Full system then the operation applies to all observers within the system. A DMB creates two groups of memory accesses, Group A and Group B: Group A Contains: • All explicit memory accesses of the required access types from observers in the same required shareability domain as Pe that are observed by Pe before the DMB instruction. These accesses include any accesses of the required access types and required shareability domain performed by Pe. • All loads of required access types from observers in the same required shareability domain as Pe that have been observed by any given observer, Py, in the same required shareability domain as Pe before Py has performed a memory access that is a member of Group A. Group B Contains: • All explicit memory accesses of the required access types by Pe that occur in program order after the DMB instruction. • All explicit memory accesses of the required access types by any given observer Px in the same required shareability domain as Pe that can only occur after Px has observed a store that is a member of Group B. Any observer with the same required shareability domain as Pe observes all members of Group A before it observes any member of Group B to the extent that those group members are required to be observed, as determined by the shareability and cacheability of the memory locations accessed by the group members. Where members of Group A and Group B access the same memory-mapped peripheral, all members of Group A will be visible at the memory-mapped peripheral before any members of Group B are visible at that peripheral. Note • A memory access might be in neither Group A nor Group B. The DMB does not affect the order of observation of such a memory access. • The second part of the definition of Group A is recursive. Ultimately, membership of Group A derives from the observation by Py of a load before Py performs an access that is a member of Group A as a result of the first part of the definition of Group A. A3-48 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model • The second part of the definition of Group B is recursive. Ultimately, membership of Group B derives from the observation by any observer of an access by Pe that is a member of Group B as a result of the first part of the definition of Group B. DMB only affects memory accesses. It has no effect on the ordering of any other instructions executing on the processor. For details of the DMB instruction in the Thumb and ARM instruction sets see DMB on page A8-90. Data Synchronization Barrier (DSB) The DSB instruction is a special memory barrier, that synchronizes the execution stream with memory accesses. The DSB instruction takes the required shareability domain and required access types as arguments. If the required shareability is Full system then the operation applies to all observers within the system. A DSB behaves as a DMB with the same arguments, and also has the additional properties defined here. A DSB completes when both: • all explicit memory accesses that are observed by Pe before the DSB is executed, are of the required access types, and are from observers in the same required shareability domain as Pe, are complete for the set of observers in the required shareability domain • all cache, branch predictor, and TLB maintenance operations issued by Pe before the DSB are complete for the required shareability domain. In addition, no instruction that appears in program order after the DSB instruction can execute until the DSB completes. For details of the DSB instruction in the Thumb and ARM instruction sets see DSB on page A8-92. Note Historically, this operation was referred to as Drain Write Buffer or Data Write Barrier (DWB). From ARMv6, these names and the use of DWB were deprecated in favor of the new Data Synchronization Barrier name and DSB abbreviation. DSB better reflects the functionality provided from ARMv6, because DSB is architecturally defined to include all cache, TLB and branch prediction maintenance operations as well as explicit memory operations. Instruction Synchronization Barrier (ISB) An ISB instruction flushes the pipeline in the processor, so that all instructions that come after the ISB instruction in program order are fetched from cache or memory only after the ISB instruction has completed. Using an ISB ensures that the effects of context altering operations executed before the ISB are visible to the instructions fetched after the ISB instruction. Examples of context altering operations that require the insertion of an ISB instruction to ensure the operations are complete are: • cache, TLB, and branch predictor maintenance operations • changes to the CP14 and CP15 registers. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-49 Application Level Memory Model In addition, any branches that appear in program order after the ISB instruction are written into the branch prediction logic with the context that is visible after the ISB instruction. This is needed to ensure correct execution of the instruction stream. Any context altering operations appearing in program order after the ISB instruction only take effect after the ISB has been executed. For details of the ISB instruction in the Thumb and ARM instruction sets see ISB on page A8-102. Pseudocode details of memory barriers The following types define the required shareability domains and required access types used as arguments for DMB and DSB instructions: enumeration MBReqDomain {MBReqDomain_FullSystem, MBReqDomain_OuterShareable, MBReqDomain_InnerShareable, MBReqDomain_Nonshareable}; enumeration MBReqTypes {MBReqTypes_All, MBReqTypes_Writes}; The following procedures perform the memory barriers: DataMemoryBarrier(MBReqDomain domain, MBReqTypes types) DataSynchronizationBarrier(MBReqDomain domain, MBReqTypes types) InstructionSynchronizationBarrier() A3-50 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model A3.9 Caches and memory hierarchy The implementation of a memory system depends heavily on the microarchitecture and therefore the details of the system are IMPLEMENTATION DEFINED. ARMv7 defines the application level interface to the memory system, and supports a hierarchical memory system with multiple levels of cache. This section provides an application level view of this system. It contains the subsections: • Introduction to caches • Memory hierarchy on page A3-52 • Implication of caches for the application programmer on page A3-52 • Preloading caches on page A3-54. A3.9.1 Introduction to caches A cache is a block of high-speed memory that contains a number of entries, each consisting of: • main memory address information, commonly known as a tag • the associated data. Caches are used to increase the average speed of a memory access. Cache operation takes account of two principles of locality: Spatial locality An access to one location is likely to be followed by accesses to adjacent locations. Examples of this principle are: • sequential instruction execution • accessing a data structure. Temporal locality An access to an area of memory is likely to be repeated in a short time period. An example of this principle is the execution of a code loop To minimize the quantity of control information stored, the spatial locality property is used to group several locations together under the same tag. This logical block is commonly known as a cache line. When data is loaded into a cache, access times for subsequent loads and stores are reduced, resulting in overall performance benefits. An access to information already in a cache is known as a cache hit, and other accesses are called cache misses. Normally, caches are self-managing, with the updates occurring automatically. Whenever the processor wants to access a cacheable location, the cache is checked. If the access is a cache hit, the access occurs in the cache, otherwise a location is allocated and the cache line loaded from memory. Different cache topologies and access policies are possible, however, they must comply with the memory coherency model of the underlying architecture. Caches introduce a number of potential problems, mainly because of: • Memory accesses occurring at times other than when the programmer would normally expect them • There being multiple physical locations where a data item can be held ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-51 Application Level Memory Model A3.9.2 Memory hierarchy Memory close to a processor has very low latency, but is limited in size and expensive to implement. Further from the processor it is easier to implement larger blocks of memory but these have increased latency. To optimize overall performance, an ARMv7 memory system can include multiple levels of cache in a hierarchical memory system. Figure A3-5 shows such a system, in an ARMv7-A implementation of a VMSA, supporting virtual addressing. Virtual address Address Translation Physical address CP15 configuration and control Processor R15 . . . R0 Instruction Prefetch Load Store Level 1 Cache Level 2 Cache Level 3 DRAM SRAM Flash ROM Level 4 for example, CF card, disk Figure A3-5 Multiple levels of cache in a memory hierarchy Note In this manual, in a hierarchical memory system, Level 1 refers to the level closest to the processor, as shown in Figure A3-5. A3.9.3 Implication of caches for the application programmer In normal operation, the caches are largely invisible to the application programmer. However they can become visible when there is a breakdown in the coherency of the caches. Such a breakdown can occur: • when memory locations are updated by other agents in the system • when memory updates made from the application code must be made visible to other agents in the system. For example: • In a system with a DMA controller that reads memory locations that are held in the data cache of a processor, a breakdown of coherency occurs when the processor has written new data in the data cache, but the DMA controller reads the old data held in memory. • In a Harvard architecture of caches, where there are separate instruction and data caches, a breakdown of coherency occurs when new instruction data has been written into the data cache, but the instruction cache still contains the old instruction data. A3-52 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Application Level Memory Model Data coherency issues You can ensure the data coherency of caches in the following ways: • By not using the caches in situations where coherency issues can arise. You can achieve this by: — using Non-cacheable or, in some cases, Write-Through Cacheable memory for the caches — not enabling caches in the system. • By using cache maintenance operations to manage the coherency issues in software, see Cache maintenance functionality on page B2-9. Many of these operations are only available to system software. • By using hardware coherency mechanisms to ensure the coherency of data accesses to memory for cacheable locations by observers within the different shareability domains, see Non-shareable Normal memory on page A3-30 and Shareable, Inner Shareable, and Outer Shareable Normal memory on page A3-30. The performance of these hardware coherency mechanisms is highly implementation specific. In some implementations the mechanism suppresses the ability to cache shareable locations. In other implementations, cache coherency hardware can hold data in caches while managing coherency between observers within the shareability domains. Instruction coherency issues How far ahead of the current point of execution instructions are prefetched from is IMPLEMENTATION DEFINED. Such prefetching can be either a fixed or a dynamically varying number of instructions, and can follow any or all possible future execution paths. For all types of memory: • the processor might have fetched the instructions from memory at any time since the last ISB, exception entry or exception return executed by that processor • any instructions fetched in this way might be executed multiple times, if this is required by the execution of the program, without being refetched from memory In addition, the ARM architecture does not require the hardware to ensure coherency between instruction caches and memory, even for regions of memory with Shareable attributes. This means that for cacheable regions of memory, an instruction cache can hold instructions that were fetched from memory before the last ISB, exception entry or exception return. If software requires coherency between instruction execution and memory, it must manage this coherency using the ISB and DSB memory barriers and cache maintenance operations, see Ordering of cache and branch predictor maintenance operations on page B2-21. Many of these operations are only available to system software. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A3-53 Application Level Memory Model A3.9.4 Preloading caches The ARM architecture provides memory system hints PLD (Preload Data) and PLI (Preload Instruction) to permit software to communicate the expected use of memory locations to the hardware. The memory system can respond by taking actions that are expected to speed up the memory accesses if and when they do occur. The effect of these memory system hints is IMPLEMENTATION DEFINED. Typically, implementations will use this information to bring the data or instruction locations into caches that have faster access times than normal memory. The Preload instructions are hints, and so implementations can treat them as NOPs without affecting the functional behavior of the device. The instructions do not generate synchronous Data Abort exceptions, but the memory system operations might, under exceptional circumstances, generate asynchronous aborts. For more information, see Data Abort exception on page B1-55. Hardware implementations can provide other implementation-specific mechanisms to prefetch memory locations in the cache. These must comply with the general cache behavior described in Cache behavior on page B2-5. A3-54 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter A4 The Instruction Sets This chapter describes the ARM and Thumb instruction sets. It contains the following sections: • About the instruction sets on page A4-2 • Unified Assembler Language on page A4-4 • Branch instructions on page A4-7 • Data-processing instructions on page A4-8 • Status register access instructions on page A4-18 • Load/store instructions on page A4-19 • Load/store multiple instructions on page A4-22 • Miscellaneous instructions on page A4-23 • Exception-generating and exception-handling instructions on page A4-24 • Coprocessor instructions on page A4-25 • Advanced SIMD and VFP load/store instructions on page A4-26 • Advanced SIMD and VFP register transfer instructions on page A4-29 • Advanced SIMD data-processing operations on page A4-30 • VFP data-processing instructions on page A4-38. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-1 The Instruction Sets A4.1 About the instruction sets ARMv7 contains two main instruction sets, the ARM and Thumb instruction sets. Much of the functionality available is identical in the two instruction sets. This chapter describes the functionality available in the instruction sets, and the Unified Assembler Language (UAL) that can be assembled to either instruction set. The two instruction sets differ in how instructions are encoded: • Thumb instructions are either 16-bit or 32-bit, and are aligned on a two-byte boundary. 16-bit and 32-bit instructions can be intermixed freely. Many common operations are most efficiently executed using 16-bit instructions. However: — Most 16-bit instructions can only access eight of the general-purpose registers, R0-R7. These are known as the low registers. A small number of 16-bit instructions can access the high registers, R8-R15. — Many operations that would require two or more 16-bit instructions can be more efficiently executed with a single 32-bit instruction. • ARM instructions are always 32-bit, and are aligned on a four-byte boundary. The ARM and Thumb instruction sets can interwork freely, that is, different procedures can be compiled or assembled to different instruction sets, and still be able to call each other efficiently. ThumbEE is a variant of the Thumb instruction set that is designed as a target for dynamically generated code. However, it cannot interwork freely with the ARM and Thumb instruction sets. See: • Chapter A5 ARM Instruction Set Encoding for encoding details of the ARM instruction set • Chapter A6 Thumb Instruction Set Encoding for encoding details of the Thumb instruction set • Chapter A8 Instruction Details for detailed descriptions of the instructions • Chapter A9 ThumbEE for encoding details of the ThumbEE instruction set. A4.1.1 Changing between Thumb state and ARM state A processor in Thumb state (that is, executing Thumb instructions) can enter ARM state (and change to executing ARM instructions) by executing any of the following instructions: BX, BLX, or an LDR or LDM that loads the PC. A processor in ARM state (that is, executing ARM instructions) can enter Thumb state (and change to executing Thumb instructions) by executing any of the same instructions. In ARMv7, a processor in ARM state can also enter Thumb state (and change to executing Thumb instructions) by executing an ADC, ADD, AND, ASR, BIC, EOR, LSL, LSR, MOV, MVN, ORR, ROR, RRX, RSB, RSC, SBC, or SUB instruction that has the PC as destination register and does not set the condition flags. A4-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets Note This permits calls and returns between ARM code written for ARMv4 processors and Thumb code running on ARMv7 processors to function correctly. In new code, ARM recommends that you use BX or BLX instructions instead. In particular, use BX LR to return from a procedure, not MOV PC,LR. The target instruction set is either encoded directly in the instruction (for the immediate offset version of BLX), or is held as bit [0] of an interworking address. For details, see the description of the BXWritePC() function in Pseudocode details of operations on ARM core registers on page A2-12. Exception entries and returns can also change between ARM and Thumb states. For details see Exceptions on page B1-30. A4.1.2 Conditional execution Most ARM instructions can be conditionally executed. This means that they only have their normal effect on the programmers’ model operation, memory and coprocessors if the N, Z, C and V flags in the APSR satisfy a condition specified in the instruction. If the flags do not satisfy this condition, the instruction acts as a NOP, that is, execution advances to the next instruction as normal, including any relevant checks for exceptions being taken, but has no other effect. Most Thumb instructions are unconditional. Conditional execution in Thumb code can be achieved using any of the following instructions: • A 16-bit conditional branch instruction, with a branch range of –256 to +254 bytes. For details see B on page A8-44. Before ARMv6T2, this was the only mechanism for conditional execution in Thumb code. • A 32-bit conditional branch instruction, with a branch range of approximately ± 1MB. For details see B on page A8-44. • 16-bit Compare and Branch on Zero and Compare and Branch on Nonzero instructions, with a branch range of +4 to +130 bytes. For details see CBNZ, CBZ on page A8-66. • A 16-bit If-Then instruction that makes up to four following instructions conditional. For details see IT on page A8-104. The instructions that are made conditional by an IT instruction are called its IT block. Instructions in an IT block must either all have the same condition, or some can have one condition, and others can have the inverse condition. For more information about conditional execution see Conditional execution on page A8-8. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-3 The Instruction Sets A4.2 Unified Assembler Language This document uses the ARM Unified Assembler Language (UAL). This assembly language syntax provides a canonical form for all ARM and Thumb instructions. UAL describes the syntax for the mnemonic and the operands of each instruction. In addition, it assumes that instructions and data items can be given labels. It does not specify the syntax to be used for labels, nor what assembler directives and options are available. See your assembler documentation for these details. Most earlier ARM assembly language mnemonics are still supported as synonyms, as described in the instruction details. Note Most earlier Thumb assembly language mnemonics are not supported. For details see Appendix C Legacy Instruction Mnemonics. UAL includes instruction selection rules that specify which instruction encoding is selected when more than one can provide the required functionality. For example, both 16-bit and 32-bit encodings exist for an ADD R0,R1,R2 instruction. The most common instruction selection rule is that when both a 16-bit encoding and a 32-bit encoding are available, the 16-bit encoding is selected, to optimize code density. Syntax options exist to override the normal instruction selection rules and ensure that a particular encoding is selected. These are useful when disassembling code, to ensure that subsequent assembly produces the original code, and in some other situations. A4.2.1 Conditional instructions For maximum portability of UAL assembly language between the ARM and Thumb instruction sets, ARM recommends that: • IT instructions are written before conditional instructions in the correct way for the Thumb instruction set. • When assembling to the ARM instruction set, assemblers check that any IT instructions are correct, but do not generate any code for them. Although other Thumb instructions are unconditional, all instructions that are made conditional by an IT instruction must be written with a condition. These conditions must match the conditions imposed by the IT instruction. For example, an ITTEE EQ instruction imposes the EQ condition on the first two following instructions, and the NE condition on the next two. Those four instructions must be written with EQ, EQ, NE and NE conditions respectively. Some instructions cannot be made conditional by an IT instruction. Some instructions can be conditional if they are the last instruction in the IT block, but not otherwise. The branch instruction encodings that include a condition field cannot be made conditional by an IT instruction. If the assembler syntax indicates a conditional branch that correctly matches a preceding IT instruction, it is assembled using a branch instruction encoding that does not include a condition field. A4-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.2.2 Use of labels in UAL instruction syntax The UAL syntax for some instructions includes the label of an instruction or a literal data item that is at a fixed offset from the instruction being specified. The assembler must: 1. Calculate the PC or Align(PC,4) value of the instruction. The PC value of an instruction is its address plus 4 for a Thumb instruction, or plus 8 for an ARM instruction. The Align(PC,4) value of an instruction is its PC value ANDed with 0xFFFFFFFC to force it to be word-aligned. There is no difference between the PC and Align(PC,4) values for an ARM instruction, but there can be for a Thumb instruction. 2. Calculate the offset from the PC or Align(PC,4) value of the instruction to the address of the labelled instruction or literal data item. 3. Assemble a PC-relative encoding of the instruction, that is, one that reads its PC or Align(PC,4) value and adds the calculated offset to form the required address. Note For instructions that can encode a subtraction operation, if the instruction cannot encode the calculated offset but can encode minus the calculated offset, the instruction encoding specifies a subtraction of minus the calculated offset. The syntax of the following instructions includes a label: • B, BL, and BLX (immediate). The assembler syntax for these instructions always specifies the label of the instruction that they branch to. Their encodings specify a sign-extended immediate offset that is added to the PC value of the instruction to form the target address of the branch. • CBNZ and CBZ. The assembler syntax for these instructions always specifies the label of the instruction that they branch to. Their encodings specify a zero-extended immediate offset that is added to the PC value of the instruction to form the target address of the branch. They do not support backward branches. • LDC, LDC2, LDR, LDRB, LDRD, LDRH, LDRSB, LDRSH, PLD, PLDW, PLI, and VLDR. The normal assembler syntax of these load instructions can specify the label of a literal data item that is to be loaded. The encodings of these instructions specify a zero-extended immediate offset that is either added to or subtracted from the Align(PC,4) value of the instruction to form the address of the data item. A few such encodings perform a fixed addition or a fixed subtraction and must only be used when that operation is required, but most contain a bit that specifies whether the offset is to be added or subtracted. When the assembler calculates an offset of 0 for the normal syntax of these instructions, it must assemble an encoding that adds 0 to the Align(PC,4) value of the instruction. Encodings that subtract 0 from the Align(PC,4) value cannot be specified by the normal syntax. There is an alternative syntax for these instructions that specifies the addition or subtraction and the immediate offset explicitly. In this syntax, the label is replaced by [PC, #+/-], where: +/- Is + or omitted to specify that the immediate offset is to be added to the Align(PC,4) value, or - if it is to be subtracted. Is the immediate offset. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-5 The Instruction Sets This alternative syntax makes it possible to assemble the encodings that subtract 0 from the Align(PC,4) value, and to disassemble them to a syntax that can be re-assembled correctly. • ADR. The normal assembler syntax for this instruction can specify the label of an instruction or literal data item whose address is to be calculated. Its encoding specifies a zero-extended immediate offset that is either added to or subtracted from the Align(PC,4) value of the instruction to form the address of the data item, and some opcode bits that determine whether it is an addition or subtraction. When the assembler calculates an offset of 0 for the normal syntax of this instruction, it must assemble the encoding that adds 0 to the Align(PC,4) value of the instruction. The encoding that subtracts 0 from the Align(PC,4) value cannot be specified by the normal syntax. There is an alternative syntax for this instruction that specifies the addition or subtraction and the immediate value explicitly, by writing them as additions ADD ,PC,# or subtractions SUB ,PC,#. This alternative syntax makes it possible to assemble the encoding that subtracts 0 from the Align(PC,4) value, and to disassemble it to a syntax that can be re-assembled correctly. Note ARM recommends that where possible, you avoid using: • the alternative syntax for the ADR, LDC, LDC2, LDR, LDRB, LDRD, LDRH, LDRSB, LDRSH, PLD, PLI, PLDW, and VLDR instructions • the encodings of these instructions that subtract 0 from the Align(PC,4) value. A4-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.3 Branch instructions Table A4-1 summarizes the branch instructions in the ARM and Thumb instruction sets. In addition to providing for changes in the flow of execution, some branch instructions can change instruction set. Table A4-1 Branch instructions Instruction See Range Range (Thumb) (ARM) Branch to target address B on page A8-44 +/–16MB +/–32MB Compare and Branch on Nonzero, Compare and Branch on Zero CBNZ, CBZ on page A8-66 0-126B a Call a subroutine Call a subroutine, change instruction setb BL, BLX (immediate) on page A8-58 +/–16MB +/–16MB +/–32MB +/–32MB Call a subroutine, optionally change instruction BLX (register) on page A8-60 Any Any set Branch to target address, change instruction set BX on page A8-62 Any Any Change to Jazelle state BXJ on page A8-64 - - Table Branch (byte offsets) Table Branch (halfword offsets) TBB, TBH on page A8-446 0-510B a 0-131070B a. These instructions do not exist in the ARM instruction set. b. The range is determined by the instruction set of the BLX instruction, not of the instruction it branches to. Branches to loaded and calculated addresses can be performed by LDR, LDM and data-processing instructions. For details see Load/store instructions on page A4-19, Load/store multiple instructions on page A4-22, Standard data-processing instructions on page A4-8, and Shift instructions on page A4-10. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-7 The Instruction Sets A4.4 Data-processing instructions Core data-processing instructions belong to one of the following groups: • Standard data-processing instructions. These instructions perform basic data-processing operations, and share a common format with some variations. • Shift instructions on page A4-10. • Saturating instructions on page A4-13. • Packing and unpacking instructions on page A4-14. • Miscellaneous data-processing instructions on page A4-15. • Parallel addition and subtraction instructions on page A4-16. • Divide instructions on page A4-17. For extension data-processing instructions, see Advanced SIMD data-processing operations on page A4-30 and VFP data-processing instructions on page A4-38. A4.4.1 Standard data-processing instructions These instructions generally have a destination register Rd, a first operand register Rn, and a second operand. The second operand can be another register Rm, or an immediate constant. If the second operand is an immediate constant, it can be: • Encoded directly in the instruction. • A modified immediate constant that uses 12 bits of the instruction to encode a range of constants. Thumb and ARM instructions have slightly different ranges of modified immediate constants. For details see Modified immediate constants in Thumb instructions on page A6-17 and Modified immediate constants in ARM instructions on page A5-9. If the second operand is another register, it can optionally be shifted in any of the following ways: LSL Logical Shift Left by 1-31 bits. LSR Logical Shift Right by 1-32 bits. ASR Arithmetic Shift Right by 1-32 bits. ROR Rotate Right by 1-31 bits. RRX Rotate Right with Extend. For details see Shift and rotate operations on page A2-5. In Thumb code, the amount to shift by is always a constant encoded in the instruction. In ARM code, the amount to shift by is either a constant encoded in the instruction, or the value of a register Rs. For instructions other than CMN, CMP, TEQ, and TST, the result of the data-processing operation is placed in the destination register. In the ARM instruction set, the destination register can be the PC, causing the result to be treated as an address to branch to. In the Thumb instruction set, this is only permitted for some 16-bit forms of the ADD and MOV instructions. A4-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets These instructions can optionally set the condition code flags, according to the result of the operation. If they do not set the flags, existing flag settings from a previous instruction are preserved. Table A4-2 summarizes the main data-processing instructions in the Thumb and ARM instruction sets. Generally, each of these instructions is described in three sections in Chapter A8 Instruction Details, one section for each of the following: • INSTRUCTION (immediate) where the second operand is a modified immediate constant. • INSTRUCTION (register) where the second operand is a register, or a register shifted by a constant. • INSTRUCTION (register-shifted register) where the second operand is a register shifted by a value obtained from another register. These are only available in the ARM instruction set. Table A4-2 Standard data-processing instructions Instruction Mnemonic Notes Add with Carry ADC Add ADD Form PC-relative Address ADR Bitwise AND AND Bitwise Bit Clear BIC Compare Negative CMN Compare CMP Bitwise Exclusive OR EOR Copy operand to destination MOV Bitwise NOT MVN Bitwise OR NOT ORN Bitwise OR ORR - Thumb instruction set permits use of a modified immediate constant or a zero-extended 12-bit immediate constant. First operand is the PC. Second operand is an immediate constant. Thumb instruction set uses a zero-extended 12-bit immediate constant. Operation is an addition or a subtraction. - - Sets flags. Like ADD but with no destination register. Sets flags. Like SUB but with no destination register. - Has only one operand, with the same options as the second operand in most of these instructions. If the operand is a shifted register, the instruction is an LSL, LSR, ASR, or ROR instruction instead. For details see Shift instructions on page A4-10. The ARM and Thumb instruction sets permit use of a modified immediate constant or a zero-extended 16-bit immediate constant. Has only one operand, with the same options as the second operand in most of these instructions. Not available in the ARM instruction set. - ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-9 The Instruction Sets Instruction Reverse Subtract Reverse Subtract with Carry Subtract with Carry Subtract Test Equivalence Test Table A4-2 Standard data-processing instructions (continued) Mnemonic Notes RSB Subtracts first operand from second operand. This permits subtraction from constants and shifted registers. RSC Not available in the Thumb instruction set. SBC - SUB Thumb instruction set permits use of a modified immediate constant or a zero-extended 12-bit immediate constant. TEQ Sets flags. Like EOR but with no destination register. TST Sets flags. Like AND but with no destination register. A4.4.2 Shift instructions Table A4-3 lists the shift instructions in the ARM and Thumb instruction sets. Table A4-3 Shift instructions Instruction See Arithmetic Shift Right ASR (immediate) on page A8-40 Arithmetic Shift Right ASR (register) on page A8-42 Logical Shift Left LSL (immediate) on page A8-178 Logical Shift Left LSL (register) on page A8-180 Logical Shift Right LSR (immediate) on page A8-182 Logical Shift Right LSR (register) on page A8-184 Rotate Right ROR (immediate) on page A8-278 Rotate Right ROR (register) on page A8-280 Rotate Right with Extend RRX on page A8-282 In the ARM instruction set only, the destination register of these instructions can be the PC, causing the result to be treated as an address to branch to. A4-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.4.3 Multiply instructions These instructions can operate on signed or unsigned quantities. In some types of operation, the results are same whether the operands are signed or unsigned. • Table A4-4 summarizes the multiply instructions where there is no distinction between signed and unsigned quantities. The least significant 32 bits of the result are used. More significant bits are discarded. • Table A4-5 summarizes the signed multiply instructions. • Table A4-6 on page A4-12 summarizes the unsigned multiply instructions. Table A4-4 General multiply instructions Instruction See Operation (number of bits) Multiply Accumulate MLA on page A8-190 32 = 32 + 32 x 32 Multiply and Subtract MLS on page A8-192 32 = 32 – 32 x 32 Multiply MUL on page A8-212 32 = 32 x 32 Table A4-5 Signed multiply instructions Instruction See Operation (number of bits) Signed Multiply Accumulate (halfwords) SMLABB, SMLABT, SMLATB, SMLATT on page A8-330 Signed Multiply Accumulate Dual SMLAD on page A8-332 Signed Multiply Accumulate Long SMLAL on page A8-334 Signed Multiply Accumulate Long (halfwords) SMLALBB, SMLALBT, SMLALTB, SMLALTT on page A8-336 Signed Multiply Accumulate Long Dual SMLALD on page A8-338 Signed Multiply Accumulate (word by halfword) SMLAWB, SMLAWT on page A8-340 Signed Multiply Subtract Dual SMLSD on page A8-342 Signed Multiply Subtract Long Dual SMLSLD on page A8-344 Signed Most Significant Word Multiply Accumulate SMMLA on page A8-346 32 = 32 + 16 x 16 32 = 32 + 16 x 16 + 16 x 16 64 = 64 + 32 x 32 64 = 64 + 16 x 16 64 = 64 + 16 x 16 + 16 x 16 32 = 32 + 32 x 16 a 32 = 32 + 16 x 16 – 16 x 16 64 = 64 + 16 x 16 – 16 x 16 32 = 32 + 32 x 32 b ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-11 The Instruction Sets Table A4-5 Signed multiply instructions (continued) Instruction See Operation (number of bits) Signed Most Significant Word Multiply Subtract SMMLS on page A8-348 32 = 32 – 32 x 32 b Signed Most Significant Word Multiply SMMUL on page A8-350 32 = 32 x 32 b Signed Dual Multiply Add SMUAD on page A8-352 32 = 16 x 16 + 16 x 16 Signed Multiply (halfwords) SMULBB, SMULBT, SMULTB, SMULTT on page A8-354 32 = 16 x 16 Signed Multiply Long SMULL on page A8-356 64 = 32 x 32 Signed Multiply (word by halfword) SMULWB, SMULWT on page A8-358 32 = 32 x 16 a Signed Dual Multiply Subtract SMUSD on page A8-360 32 = 16 x 16 – 16 x 16 a. The most significant 32 bits of the 48-bit product are used. Less significant bits are discarded. b. The most significant 32 bits of the 64-bit product are used. Less significant bits are discarded. Table A4-6 Unsigned multiply instructions Instruction See Operation (number of bits) Unsigned Multiply Accumulate Accumulate Long UMAAL on page A8-482 64 = 32 + 32 + 32 x 32 Unsigned Multiply Accumulate Long UMLAL on page A8-484 64 = 64 + 32 x 32 Unsigned Multiply Long UMULL on page A8-486 64 = 32 x 32 A4-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.4.4 Saturating instructions Table A4-7 lists the saturating instructions in the ARM and Thumb instruction sets. For more information, see Pseudocode details of saturation on page A2-9. Table A4-7 Saturating instructions Instruction See Operation Signed Saturate SSAT on page A8-362 Saturates optionally shifted 32-bit value to selected range Signed Saturate 16 SSAT16 on page A8-364 Saturates two 16-bit values to selected range Unsigned Saturate USAT on page A8-504 Saturates optionally shifted 32-bit value to selected range Unsigned Saturate 16 USAT16 on page A8-506 Saturates two 16-bit values to selected range ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-13 The Instruction Sets A4.4.5 Packing and unpacking instructions Table A4-8 lists the packing and unpacking instructions in the ARM and Thumb instruction sets. These are all available from ARMv6T2 in the Thumb instruction set, and from ARMv6 onwards in the ARM instruction set. Table A4-8 Packing and unpacking instructions Instruction See Operation Pack Halfword PKH on page A8-234 Combine halfwords Signed Extend and Add Byte SXTAB on page A8-434 Extend 8 bits to 32 and add Signed Extend and Add Byte 16 SXTAB16 on page A8-436 Dual extend 8 bits to 16 and add Signed Extend and Add Halfword SXTAH on page A8-438 Extend 16 bits to 32 and add Signed Extend Byte SXTB on page A8-440 Extend 8 bits to 32 Signed Extend Byte 16 SXTB16 on page A8-442 Dual extend 8 bits to 16 Signed Extend Halfword SXTH on page A8-444 Extend 16 bits to 32 Unsigned Extend and Add Byte UXTAB on page A8-514 Extend 8 bits to 32 and add Unsigned Extend and Add Byte 16 UXTAB16 on page A8-516 Dual extend 8 bits to 16 and add Unsigned Extend and Add Halfword UXTAH on page A8-518 Extend 16 bits to 32 and add Unsigned Extend Byte UXTB on page A8-520 Extend 8 bits to 32 Unsigned Extend Byte 16 UXTB16 on page A8-522 Dual extend 8 bits to 16 Unsigned Extend Halfword UXTH on page A8-524 Extend 16 bits to 32 A4-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.4.6 Miscellaneous data-processing instructions Table A4-9 lists the miscellaneous data-processing instructions in the ARM and Thumb instruction sets. Immediate values in these instructions are simple binary numbers. Table A4-9 Miscellaneous data-processing instructions Instruction See Notes Bit Field Clear Bit Field Insert Count Leading Zeros Move Top BFC on page A8-46 BFI on page A8-48 CLZ on page A8-72 MOVT on page A8-200 Reverse Bits RBIT on page A8-270 Byte-Reverse Word REV on page A8-272 Byte-Reverse Packed Halfword REV16 on page A8-274 Byte-Reverse Signed Halfword REVSH on page A8-276 Signed Bit Field Extract SBFX on page A8-308 Select Bytes using GE flags SEL on page A8-312 Unsigned Bit Field Extract UBFX on page A8-466 Unsigned Sum of Absolute Differences USAD8 on page A8-500 Unsigned Sum of Absolute Differences USADA8 on page A8-502 and Accumulate Moves 16-bit immediate value to top halfword. Bottom halfword unchanged. - ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-15 The Instruction Sets A4.4.7 Parallel addition and subtraction instructions These instructions perform additions and subtractions on the values of two registers and write the result to a destination register, treating the register values as sets of two halfwords or four bytes. They are available in ARMv6 and above. These instructions consist of a prefix followed by a main instruction mnemonic. The prefixes are as follows: S Signed arithmetic modulo 28 or 216. Q Signed saturating arithmetic. SH Signed arithmetic, halving the results. U Unsigned arithmetic modulo 28 or 216. UQ Unsigned saturating arithmetic. UH Unsigned arithmetic, halving the results. The main instruction mnemonics are as follows: ADD16 Adds the top halfwords of two operands to form the top halfword of the result, and the bottom halfwords of the same two operands to form the bottom halfword of the result. ASX Exchanges halfwords of the second operand, and then adds top halfwords and subtracts bottom halfwords. SAX Exchanges halfwords of the second operand, and then subtracts top halfwords and adds bottom halfwords. SUB16 Subtracts each halfword of the second operand from the corresponding halfword of the first operand to form the corresponding halfword of the result. ADD8 Adds each byte of the second operand to the corresponding byte of the first operand to form the corresponding byte of the result. SUB8 Subtracts each byte of the second operand from the corresponding byte of the first operand to form the corresponding byte of the result. The instruction set permits all 36 combinations of prefix and main instruction operand. See also Advanced SIMD parallel addition and subtraction on page A4-31. A4-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.4.8 Divide instructions In the ARMv7-R profile, the Thumb instruction set includes signed and unsigned integer divide instructions that are implemented in hardware. For details of the instructions see: • SDIV on page A8-310 • UDIV on page A8-468. Note • SDIV and UDIV are UNDEFINED in the ARMv7-A profile. • The ARMv7-M profile also includes the SDIV and UDIV instructions. In the ARMv7-R profile, the SCTLR.DZ bit enables divide by zero fault detection, see c1, System Control Register (SCTLR) on page B4-45: DZ == 0 Divide-by-zero returns a zero result. DZ == 1 SDIV and UDIV generate an Undefined Instruction exception on a divide-by-zero. The SCTLR.DZ bit is cleared to zero on reset. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-17 The Instruction Sets A4.5 Status register access instructions The MRS and MSR instructions move the contents of the Application Program Status Register (APSR) to or from a general-purpose register. The APSR is described in The Application Program Status Register (APSR) on page A2-14. The condition flags in the APSR are normally set by executing data-processing instructions, and are normally used to control the execution of conditional instructions. However, you can set the flags explicitly using the MSR instruction, and you can read the current state of the flags explicitly using the MRS instruction. For details of the system level use of status register access instructions CPS, MRS, and MSR, see Chapter B6 System Instructions. A4-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.6 Load/store instructions Table A4-10 summarizes the general-purpose register load/store instructions in the ARM and Thumb instruction sets. See also: • Load/store multiple instructions on page A4-22 • Advanced SIMD and VFP load/store instructions on page A4-26. Load/store instructions have several options for addressing memory. For more information, see Addressing modes on page A4-20. Table A4-10 Load/store instructions Data type Load Store Load unprivileged Store unprivileged LoadExclusive StoreExclusive 32-bit word LDR STR 16-bit halfword - STRH 16-bit unsigned halfword LDRH - 16-bit signed halfword LDRSH - 8-bit byte - STRB 8-bit unsigned byte LDRB - 8-bit signed byte LDRSB - Two 32-bit words LDRD STRD 64-bit doubleword - - LDRT - LDRHT LDRSHT - LDRBT LDRSBT - STRT STRHT - - STRBT - LDREX - LDREXH - LDREXB - LDREXD STREX STREXH - STREXB - STREXD A4.6.1 Loads to the PC The LDR instruction can be used to load a value into the PC. The value loaded is treated as an interworking address, as described by the LoadWritePC() pseudocode function in Pseudocode details of operations on ARM core registers on page A2-12. A4.6.2 Halfword and byte loads and stores Halfword and byte stores store the least significant halfword or byte from the register, to 16 or 8 bits of memory respectively. There is no distinction between signed and unsigned stores. Halfword and byte loads load 16 or 8 bits from memory into the least significant halfword or byte of a register. Unsigned loads zero-extend the loaded value to 32 bits, and signed loads sign-extend the value to 32 bits. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-19 The Instruction Sets A4.6.3 Unprivileged loads and stores In an unprivileged mode, unprivileged loads and stores operate in exactly the same way as the corresponding ordinary operations. In a privileged mode, unprivileged loads and stores are treated as though they were executed in an unprivileged mode. For more information, see Privilege level access controls for data accesses on page A3-38. A4.6.4 Exclusive loads and stores Exclusive loads and stores provide for shared memory synchronization. For more information, see Synchronization and semaphores on page A3-12. A4.6.5 Addressing modes The address for a load or store is formed from two parts: a value from a base register, and an offset. The base register can be any one of the general-purpose registers. For loads, the base register can be the PC. This permits PC-relative addressing for position-independent code. Instructions marked (literal) in their title in Chapter A8 Instruction Details are PC-relative loads. The offset takes one of three formats: Immediate The offset is an unsigned number that can be added to or subtracted from the base register value. Immediate offset addressing is useful for accessing data elements that are a fixed distance from the start of the data object, such as structure fields, stack offsets and input/output registers. Register The offset is a value from a general-purpose register. This register cannot be the PC. The value can be added to, or subtracted from, the base register value. Register offsets are useful for accessing arrays or blocks of data. Scaled register The offset is a general-purpose register, other than the PC, shifted by an immediate value, then added to or subtracted from the base register. This means an array index can be scaled by the size of each array element. The offset and base register can be used in three different ways to form the memory address. The addressing modes are described as follows: Offset The offset is added to or subtracted from the base register to form the memory address. Pre-indexed The offset is added to or subtracted from the base register to form the memory address. The base register is then updated with this new address, to permit automatic indexing through an array or memory block. Post-indexed The value of the base register alone is used as the memory address. The offset is then added to or subtracted from the base register. The result is stored back in the base register, to permit automatic indexing through an array or memory block. A4-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets Note Not every variant is available for every instruction, and the range of permitted immediate values and the options for scaled registers vary from instruction to instruction. See Chapter A8 Instruction Details for full details for each instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-21 The Instruction Sets A4.7 Load/store multiple instructions Load Multiple instructions load a subset, or possibly all, of the general-purpose registers from memory. Store Multiple instructions store a subset, or possibly all, of the general-purpose registers to memory. The memory locations are consecutive word-aligned words. The addresses used are obtained from a base register, and can be either above or below the value in the base register. The base register can optionally be updated by the total size of the data transferred. Table A4-11 summarizes the load/store multiple instructions in the ARM and Thumb instruction sets. Table A4-11 Load/store multiple instructions Instruction See Load Multiple, Increment After or Full Descending LDM / LDMIA / LDMFD on page A8-110 Load Multiple, Decrement After or Full Ascending a LDMDA / LDMFA on page A8-112 Load Multiple, Decrement Before or Empty Ascending LDMDB / LDMEA on page A8-114 Load Multiple, Increment Before or Empty Descending a LDMIB / LDMED on page A8-116 Pop multiple registers off the stack b POP on page A8-246 Push multiple registers onto the stack c PUSH on page A8-248 Store Multiple, Increment After or Empty Ascending STM / STMIA / STMEA on page A8-374 Store Multiple, Decrement After or Empty Descending a STMDA / STMED on page A8-376 Store Multiple, Decrement Before or Full Descending STMDB / STMFD on page A8-378 Store Multiple, Increment Before or Full Ascending a STMIB / STMFA on page A8-380 a. Not available in the Thumb instruction set. b. This instruction is equivalent to an LDM instruction with the SP as base register, and base register updating. c. This instruction is equivalent to an STMDB instruction with the SP as base register, and base register updating. System level variants of the LDM and STM instructions load and store User mode registers from a privileged mode. Another system level variant of the LDM instruction performs an exception return. For details, see Chapter B6 System Instructions. A4.7.1 Loads to the PC The LDM, LDMDA, LDMDB, LDMIB, and POP instructions can be used to load a value into the PC. The value loaded is treated as an interworking address, as described by the LoadWritePC() pseudocode function in Pseudocode details of operations on ARM core registers on page A2-12. A4-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.8 Miscellaneous instructions Table A4-12 summarizes the miscellaneous instructions in the ARM and Thumb instruction sets. Table A4-12 Miscellaneous instructions Instruction See Clear-Exclusive CLREX on page A8-70 Debug hint DBG on page A8-88 Data Memory Barrier DMB on page A8-90 Data Synchronization Barrier DSB on page A8-92 Instruction Synchronization Barrier ISB on page A8-102 If Then (makes following instructions conditional) IT on page A8-104 No Operation NOP on page A8-222 Preload Data PLD, PLDW (immediate) on page A8-236 PLD (literal) on page A8-238 PLD, PLDW (register) on page A8-240 Preload Instruction PLI (immediate, literal) on page A8-242 PLI (register) on page A8-244 Set Endianness SETEND on page A8-314 Send Event SEV on page A8-316 Supervisor Call SVC (previously SWI) on page A8-430 Swap, Swap Byte. Use deprecated. a SWP, SWPB on page A8-432 Wait For Event WFE on page A8-808 Wait For Interrupt WFI on page A8-810 Yield YIELD on page A8-812 a. Use Load/Store-Exclusive instructions instead, see Load/store instructions on page A4-19. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-23 The Instruction Sets A4.9 Exception-generating and exception-handling instructions The following instructions are intended specifically to cause a processor exception to occur: • The Supervisor Call (SVC, previously SWI) instruction is used to cause an SVC exception to occur. This is the main mechanism for User mode code to make calls to privileged operating system code. For more information, see Supervisor Call (SVC) exception on page B1-52. • The Breakpoint instruction BKPT provides for software breakpoints. For more information, see About debug events on page C3-2. • In privileged system level code, the Secure Monitor Call (SMC, previously SMI) instruction. For more information, see Secure Monitor Call (SMC) exception on page B1-53. System level variants of the SUBS and LDM instructions can be used to return from exceptions. From ARMv6, the SRS instruction can be used near the start of an exception handler to store return information, and the RFE instruction can be used to return from an exception using the stored return information. For details of these instructions, see Chapter B6 System Instructions. A4-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.10 Coprocessor instructions There are three types of instruction for communicating with coprocessors. These permit the processor to: • Initiate a coprocessor data-processing operation. For details see CDP, CDP2 on page A8-68. • Transfer general-purpose registers to and from coprocessor registers. For details, see: — MCR, MCR2 on page A8-186 — MCRR, MCRR2 on page A8-188 — MRC, MRC2 on page A8-202 — MRRC, MRRC2 on page A8-204. • Load or store the values of coprocessor registers. For details, see: — LDC, LDC2 (immediate) on page A8-106 — LDC, LDC2 (literal) on page A8-108 — STC, STC2 on page A8-372. The instruction set distinguishes up to 16 coprocessors with a 4-bit field in each coprocessor instruction, so each coprocessor is assigned a particular number. Note One coprocessor can use more than one of the 16 numbers if a large coprocessor instruction set is required. Coprocessors 10 and 11 are used, together, for VFP and some Advanced SIMD functionality. There are different instructions for accessing these coprocessors, of similar types to the instructions for the other coprocessors, that is, to: • Initiate a coprocessor data-processing operation. For details see VFP data-processing instructions on page A4-38. • Transfer general-purpose registers to and from coprocessor registers. For details, see Advanced SIMD and VFP register transfer instructions on page A4-29. • Load or store the values of coprocessor registers. For details, see Advanced SIMD and VFP load/store instructions on page A4-26. Coprocessors execute the same instruction stream as the processor, ignoring non-coprocessor instructions and coprocessor instructions for other coprocessors. Coprocessor instructions that cannot be executed by any coprocessor hardware cause an Undefined Instruction exception. For more information about specific coprocessors see Coprocessor support on page A2-68. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-25 The Instruction Sets A4.11 Advanced SIMD and VFP load/store instructions Table A4-13 summarizes the extension register load/store instructions in the Advanced SIMD and VFP instruction sets. Advanced SIMD also provides instructions for loading and storing multiple elements, or structures of elements, see Element and structure load/store instructions on page A4-27. Table A4-13 Extension register load/store instructions Instruction See Operation Vector Load Multiple Vector Load Register Vector Store Multiple Vector Store Register VLDM on page A8-626 VLDR on page A8-628 VSTM on page A8-784 VSTR on page A8-786 Load 1-16 consecutive 64-bit registers (Adv. SIMD and VFP) Load 1-16 consecutive 32-bit registers (VFP only) Load one 64-bit register (Adv. SIMD and VFP) Load one 32-bit register (VFP only) Store 1-16 consecutive 64-bit registers (Adv. SIMD and VFP) Store 1-16 consecutive 32-bit registers (VFP only) Store one 64-bit register (Adv. SIMD and VFP) Store one 32-bit register (VFP only) A4-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.11.1 Element and structure load/store instructions Table A4-14 shows the element and structure load/store instructions available in the Advanced SIMD instruction set. Loading and storing structures of more than one element automatically de-interleaves or interleaves the elements, see Figure A4-1 on page A4-28 for an example of de-interleaving. Interleaving is the inverse process. Table A4-14 Element and structure load/store instructions Instruction See Load single element Multiple elements VLD1 (multiple single elements) on page A8-602 To one lane VLD1 (single element to one lane) on page A8-604 To all lanes VLD1 (single element to all lanes) on page A8-606 Load 2-element structure Multiple structures VLD2 (multiple 2-element structures) on page A8-608 To one lane VLD2 (single 2-element structure to one lane) on page A8-610 To all lanes VLD2 (single 2-element structure to all lanes) on page A8-612 Load 3-element structure Multiple structures VLD3 (multiple 3-element structures) on page A8-614 To one lane VLD3 (single 3-element structure to one lane) on page A8-616 To all lanes VLD3 (single 3-element structure to all lanes) on page A8-618 Load 4-element structure Multiple structures VLD4 (multiple 4-element structures) on page A8-620 To one lane VLD4 (single 4-element structure to one lane) on page A8-622 To all lanes VLD4 (single 4-element structure to all lanes) on page A8-624 Store single element Multiple elements VST1 (multiple single elements) on page A8-768 From one lane VST1 (single element from one lane) on page A8-770 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-27 The Instruction Sets Table A4-14 Element and structure load/store instructions (continued) Instruction See Store 2-element structure Multiple structures VST2 (multiple 2-element structures) on page A8-772 From one lane VST2 (single 2-element structure from one lane) on page A8-774 Store 3-element structure Multiple structures VST3 (multiple 3-element structures) on page A8-776 From one lane VST3 (single 3-element structure from one lane) on page A8-778 Store 4-element structure Multiple structures VST4 (multiple 4-element structures) on page A8-780 From one lane VST4 (single 4-element structure from one lane) on page A8-782 Memory A[0].x A[0].y A[0].z A[1].x A[1].y A[1].z A[2].x A[2].y A[2].z A[3].x A[3].y A[3].z Z3 Z2 Z1 Z0 D2 Y3 Y2 Y1 Y0 D1 X3 X2 X1 X0 D0 Registers Figure A4-1 De-interleaving an array of 3-element structures A4-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.12 Advanced SIMD and VFP register transfer instructions Table A4-15 summarizes the extension register transfer instructions in the Advanced SIMD and VFP instruction sets. These instructions transfer data from ARM core registers to extension registers, or from extension registers to ARM core registers. Advanced SIMD vectors, and single-precision and double-precision VFP registers, are all views of the same extension register set. For details see Advanced SIMD and VFP extension registers on page A2-21. Table A4-15 Extension register transfer instructions Instruction See Copy element from ARM core register to every element of Advanced SIMD vector VDUP (ARM core register) on page A8-594 Copy byte, halfword, or word from ARM core register to extension register VMOV (ARM core register to scalar) on page A8-644 Copy byte, halfword, or word from extension register to ARM core register VMOV (scalar to ARM core register) on page A8-646 Copy from single-precision VFP register to ARM core register, VMOV (between ARM core register and or from ARM core register to single-precision VFP register single-precision register) on page A8-648 Copy two words from ARM core registers to consecutive single-precision VFP registers, or from consecutive single-precision VFP registers to ARM core registers VMOV (between two ARM core registers and two single-precision registers) on page A8-650 Copy two words from ARM core registers to doubleword extension register, or from doubleword extension register to ARM core registers VMOV (between two ARM core registers and a doubleword extension register) on page A8-652 Copy from Advanced SIMD and VFP extension System Register VMRS on page A8-658 to ARM core register VMRS on page B6-27 (system level view) Copy from ARM core register to Advanced SIMD and VFP extension System Register VMSR on page A8-660 VMSR on page B6-29 (system level view) ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-29 The Instruction Sets A4.13 Advanced SIMD data-processing operations Advanced SIMD data-processing operations process registers containing vectors of elements of the same type packed together, enabling the same operation to be performed on multiple items in parallel. Instructions operate on vectors held in 64-bit or 128-bit registers. Figure A4-2 shows an operation on two 64-bit operand vectors, generating a 64-bit vector result. Note Figure A4-2 and other similar figures show 64-bit vectors that consist of four 16-bit elements, and 128-bit vectors that consist of four 32-bit elements. Other element sizes produce similar figures, but with one, two, eight, or sixteen operations performed in parallel instead of four. Dn Dm Op Op Op Op Dd Figure A4-2 Advanced SIMD instruction operating on 64-bit registers Many Advanced SIMD instructions have variants that produce vectors of elements double the size of the inputs. In this case, the number of elements in the result vector is the same as the number of elements in the operand vectors, but each element, and the whole vector, is double the size. Figure A4-3 shows an example of an Advanced SIMD instruction operating on 64-bit registers, and generating a 128-bit result. Dn Dm Op Op Op Op Qd Figure A4-3 Advanced SIMD instruction producing wider result There are also Advanced SIMD instructions that have variants that produce vectors containing elements half the size of the inputs. Figure A4-4 on page A4-31 shows an example of an Advanced SIMD instruction operating on one 128-bit register, and generating a 64-bit result. A4-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets Qn Op Op Op Op Dd Figure A4-4 Advanced SIMD instruction producing narrower result Some Advanced SIMD instructions do not conform to these standard patterns. Their operation patterns are described in the individual instruction descriptions. Advanced SIMD instructions that perform floating-point arithmetic use the ARM standard floating-point arithmetic defined in Floating-point data types and arithmetic on page A2-32. A4.13.1 Advanced SIMD parallel addition and subtraction Table A4-16 shows the Advanced SIMD parallel add and subtract instructions. Table A4-16 Advanced SIMD parallel add and subtract instructions Instruction See Vector Add Vector Add and Narrow, returning High Half Vector Add Long, Vector Add Wide Vector Halving Add, Vector Halving Subtract Vector Pairwise Add and Accumulate Long Vector Pairwise Add Vector Pairwise Add Long Vector Rounding Add and Narrow, returning High Half Vector Rounding Halving Add Vector Rounding Subtract and Narrow, returning High Half Vector Saturating Add Vector Saturating Subtract VADD (integer) on page A8-536 VADD (floating-point) on page A8-538 VADDHN on page A8-540 VADDL, VADDW on page A8-542 VHADD, VHSUB on page A8-600 VPADAL on page A8-682 VPADD (integer) on page A8-684 VPADD (floating-point) on page A8-686 VPADDL on page A8-688 VRADDHN on page A8-726 VRHADD on page A8-734 VRSUBHN on page A8-748 VQADD on page A8-700 VQSUB on page A8-724 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-31 The Instruction Sets Table A4-16 Advanced SIMD parallel add and subtract instructions (continued) Instruction See Vector Subtract Vector Subtract and Narrow, returning High Half Vector Subtract Long, Vector Subtract Wide VSUB (integer) on page A8-788 VSUB (floating-point) on page A8-790 VSUBHN on page A8-792 VSUBL, VSUBW on page A8-794 A4.13.2 Bitwise Advanced SIMD data-processing instructions Table A4-17 shows bitwise Advanced SIMD data-processing instructions. These operate on the doubleword (64-bit) or quadword (128-bit) extension registers, and there is no division into vector elements. Table A4-17 Bitwise Advanced SIMD data-processing instructions Instruction See Vector Bitwise AND VAND (register) on page A8-544 Vector Bitwise Bit Clear (AND complement) VBIC (immediate) on page A8-546 VBIC (register) on page A8-548 Vector Bitwise Exclusive OR VEOR on page A8-596 Vector Bitwise Insert if False Vector Bitwise Insert if True VBIF, VBIT, VBSL on page A8-550 Vector Bitwise Move VMOV (immediate) on page A8-640 VMOV (register) on page A8-642 Vector Bitwise NOT VMVN (immediate) on page A8-668 VMVN (register) on page A8-670 Vector Bitwise OR VORR (immediate) on page A8-678 VORR (register) on page A8-680 Vector Bitwise OR NOT VORN (register) on page A8-676 Vector Bitwise Select VBIF, VBIT, VBSL on page A8-550 A4-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.13.3 Advanced SIMD comparison instructions Table A4-18 shows Advanced SIMD comparison instructions. Table A4-18 Advanced SIMD comparison instructions Instruction See Vector Absolute Compare VACGE, VACGT, VACLE,VACLT on page A8-534 Vector Compare Equal VCEQ (register) on page A8-552 Vector Compare Equal to Zero VCEQ (immediate #0) on page A8-554 Vector Compare Greater Than or Equal VCGE (register) on page A8-556 Vector Compare Greater Than or Equal to Zero VCGE (immediate #0) on page A8-558 Vector Compare Greater Than VCGT (register) on page A8-560 Vector Compare Greater Than Zero VCGT (immediate #0) on page A8-562 Vector Compare Less Than or Equal to Zero VCLE (immediate #0) on page A8-564 Vector Compare Less Than Zero VCLT (immediate #0) on page A8-568 Vector Test Bits VTST on page A8-802 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-33 The Instruction Sets A4.13.4 Advanced SIMD shift instructions Table A4-19 lists the shift instructions in the Advanced SIMD instruction set. Table A4-19 Advanced SIMD shift instructions Instruction See Vector Saturating Rounding Shift Left Vector Saturating Rounding Shift Right and Narrow Vector Saturating Shift Left Vector Saturating Shift Right and Narrow Vector Rounding Shift Left Vector Rounding Shift Right Vector Rounding Shift Right and Accumulate Vector Rounding Shift Right and Narrow Vector Shift Left Vector Shift Left Long Vector Shift Right Vector Shift Right and Narrow Vector Shift Left and Insert Vector Shift Right and Accumulate Vector Shift Right and Insert VQRSHL on page A8-714 VQRSHRN, VQRSHRUN on page A8-716 VQSHL (register) on page A8-718 VQSHL, VQSHLU (immediate) on page A8-720 VQSHRN, VQSHRUN on page A8-722 VRSHL on page A8-736 VRSHR on page A8-738 VRSRA on page A8-746 VRSHRN on page A8-740 VSHL (immediate) on page A8-750 VSHL (register) on page A8-752 VSHLL on page A8-754 VSHR on page A8-756 VSHRN on page A8-758 VSLI on page A8-760 VSRA on page A8-764 VSRI on page A8-766 A4-34 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets A4.13.5 Advanced SIMD multiply instructions Table A4-20 summarizes the Advanced SIMD multiply instructions. Table A4-20 Advanced SIMD multiply instructions Instruction See Vector Multiply Accumulate Vector Multiply Accumulate Long Vector Multiply Subtract Vector Multiply Subtract Long VMLA, VMLAL, VMLS, VMLSL (integer) on page A8-634 VMLA, VMLS (floating-point) on page A8-636 VMLA, VMLAL, VMLS, VMLSL (by scalar) on page A8-638 Vector Multiply Vector Multiply Long VMUL, VMULL (integer and polynomial) on page A8-662 VMUL (floating-point) on page A8-664 VMUL, VMULL (by scalar) on page A8-666 Vector Saturating Doubling Multiply Accumulate Long Vector Saturating Doubling Multiply Subtract Long VQDMLAL, VQDMLSL on page A8-702 Vector Saturating Doubling Multiply Returning High Half VQDMULH on page A8-704 Vector Saturating Rounding Doubling Multiply Returning VQRDMULH on page A8-712 High Half Vector Saturating Doubling Multiply Long VQDMULL on page A8-706 Advanced SIMD multiply instructions can operate on vectors of: • 8-bit, 16-bit, or 32-bit unsigned integers • 8-bit, 16-bit, or 32-bit signed integers • 8-bit or 16-bit polynomials over {0,1} (VMUL and VMULL only) • single-precision (32-bit) floating-point numbers. They can also act on one vector and one scalar. Long instructions have doubleword (64-bit) operands, and produce quadword (128-bit) results. Other Advanced SIMD multiply instructions can have either doubleword or quadword operands, and produce results of the same size. VFP multiply instructions can operate on: • single-precision (32-bit) floating-point numbers • double-precision (64-bit) floating-point numbers. Some VFP implementations do not support double-precision numbers. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-35 The Instruction Sets A4.13.6 Miscellaneous Advanced SIMD data-processing instructions Table A4-21 shows miscellaneous Advanced SIMD data-processing instructions. Table A4-21 Miscellaneous Advanced SIMD data-processing instructions Instruction See Vector Absolute Difference and Accumulate Vector Absolute Difference Vector Absolute Vector Convert between floating-point and fixed point Vector Convert between floating-point and integer Vector Convert between half-precision and single-precision Vector Count Leading Sign Bits Vector Count Leading Zeros Vector Count Set Bits Vector Duplicate scalar Vector Extract Vector Move and Narrow Vector Move Long Vector Maximum, Minimum Vector Negate Vector Pairwise Maximum, Minimum Vector Reciprocal Estimate Vector Reciprocal Step Vector Reciprocal Square Root Estimate VABA, VABAL on page A8-526 VABD, VABDL (integer) on page A8-528 VABD (floating-point) on page A8-530 VABS on page A8-532 VCVT (between floating-point and fixed-point, Advanced SIMD) on page A8-580 VCVT (between floating-point and integer, Advanced SIMD) on page A8-576 VCVT (between half-precision and single-precision, Advanced SIMD) on page A8-586 VCLS on page A8-566 VCLZ on page A8-570 VCNT on page A8-574 VDUP (scalar) on page A8-592 VEXT on page A8-598 VMOVN on page A8-656 VMOVL on page A8-654 VMAX, VMIN (integer) on page A8-630 VMAX, VMIN (floating-point) on page A8-632 VNEG on page A8-672 VPMAX, VPMIN (integer) on page A8-690 VPMAX, VPMIN (floating-point) on page A8-692 VRECPE on page A8-728 VRECPS on page A8-730 VRSQRTE on page A8-742 A4-36 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The Instruction Sets Table A4-21 Miscellaneous Advanced SIMD data-processing instructions (continued) Instruction See Vector Reciprocal Square Root Step Vector Reverse Vector Saturating Absolute Vector Saturating Move and Narrow Vector Saturating Negate Vector Swap Vector Table Lookup Vector Transpose Vector Unzip Vector Zip VRSQRTS on page A8-744 VREV16, VREV32, VREV64 on page A8-732 VQABS on page A8-698 VQMOVN, VQMOVUN on page A8-708 VQNEG on page A8-710 VSWP on page A8-796 VTBL, VTBX on page A8-798 VTRN on page A8-800 VUZP on page A8-804 VZIP on page A8-806 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A4-37 The Instruction Sets A4.14 VFP data-processing instructions Table A4-22 summarizes the data-processing instructions in the VFP instruction set. For details of the floating-point arithmetic used by VFP instructions, see Floating-point data types and arithmetic on page A2-32. Table A4-22 VFP data-processing instructions Instruction See Absolute value VABS on page A8-532 Add VADD (floating-point) on page A8-538 Compare (optionally with exceptions enabled) VCMP, VCMPE on page A8-572 Convert between floating-point and integer VCVT, VCVTR (between floating-point and integer, VFP) on page A8-578 Convert between floating-point and fixed-point VCVT (between floating-point and fixed-point, VFP) on page A8-582 Convert between double-precision and single-precision VCVT (between double-precision and single-precision) on page A8-584 Convert between half-precision and single-precision VCVTB, VCVTT (between half-precision and single-precision, VFP) on page A8-588 Divide VDIV on page A8-590 Multiply Accumulate, Multiply Subtract VMLA, VMLS (floating-point) on page A8-636 Move immediate value to extension register VMOV (immediate) on page A8-640 Copy from one extension register to another VMOV (register) on page A8-642 Multiply VMUL (floating-point) on page A8-664 Negate (invert the sign bit) VNEG on page A8-672 Multiply Accumulate and Negate, Multiply Subtract VNMLA, VNMLS, VNMUL on page A8-674 and Negate, Multiply and Negate Square Root VSQRT on page A8-762 Subtract VSUB (floating-point) on page A8-790 A4-38 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter A5 ARM Instruction Set Encoding This chapter describes the encoding of the ARM instruction set. It contains the following sections: • ARM instruction set encoding on page A5-2 • Data-processing and miscellaneous instructions on page A5-4 • Load/store word and unsigned byte on page A5-19 • Media instructions on page A5-21 • Branch, branch with link, and block data transfer on page A5-27 • Supervisor Call, and coprocessor instructions on page A5-28 • Unconditional instructions on page A5-30. Note • Architecture variant information in this chapter describes the architecture variant or extension in which the instruction encoding was introduced into the ARM instruction set. All means that the instruction encoding was introduced in ARMv4 or earlier, and so is in all variants of the ARM instruction set covered by this manual. • In the decode tables in this chapter, an entry of - for a field value means the value of the field does not affect the decoding. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-1 ARM Instruction Set Encoding A5.1 ARM instruction set encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond op1 op The ARM instruction stream is a sequence of word-aligned words. Each ARM instruction is a single 32-bit word in that stream. Table A5-1 shows the major subdivisions of the ARM instruction set, determined by bits [31:25,4]. Most ARM instructions can be conditional, with a condition determined by bits [31:28] of the instruction, the cond field. For details see The condition field. This applies to all instructions except those with the cond field equal to 0b1111. Table A5-1 ARM instruction encoding cond op1 op Instruction classes not 1111 00x 010 011 0 1 10x 11x - 1111 - - Data-processing and miscellaneous instructions on page A5-4. Load/store word and unsigned byte on page A5-19. Load/store word and unsigned byte on page A5-19. Media instructions on page A5-21. Branch, branch with link, and block data transfer on page A5-27. Supervisor Call, and coprocessor instructions on page A5-28. Includes VFP instructions and Advanced SIMD data transfers, see Chapter A7 Advanced SIMD and VFP Instruction Encoding. If the cond field is 0b1111, the instruction can only be executed unconditionally, see Unconditional instructions on page A5-30. Includes Advanced SIMD instructions, see Chapter A7 Advanced SIMD and VFP Instruction Encoding. A5.1.1 The condition field Every conditional instruction contains a 4-bit condition code field in bits 31 to 28: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond This field contains one of the values 0b0000-0b1110 described in Table A8-1 on page A8-8. Most instruction mnemonics can be extended with the letters defined in the mnemonic extension field. If the always (AL) condition is specified, the instruction is executed irrespective of the value of the condition code flags. The absence of a condition code on an instruction mnemonic implies the AL condition code. A5-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding A5.1.2 UNDEFINED and UNPREDICTABLE instruction set space An attempt to execute an unallocated instruction results in either: • Unpredictable behavior. The instruction is described as UNPREDICTABLE. • An Undefined Instruction exception. The instruction is described as UNDEFINED. An instruction is UNDEFINED if it is declared as UNDEFINED in an instruction description, or in this chapter. An instruction is UNPREDICTABLE if: • it is declared as UNPREDICTABLE in an instruction description or in this chapter • the pseudocode for that encoding does not indicate that a different special case applies, and a bit marked (0) or (1) in the encoding diagram of an instruction is not 0 or 1 respectively. Unless otherwise specified: • ARM instructions introduced in an architecture variant are UNDEFINED in earlier architecture variants. • ARM instructions introduced in one or more architecture extensions are UNDEFINED if none of those extensions are implemented. A5.1.3 The PC and the use of 0b1111 as a register specifier In ARM instructions, the use of 0b1111 as a register specifier specifies the PC. Many instructions are UNPREDICTABLE if they use 0b1111 as a register specifier. This is specified by pseudocode in the instruction description. Note Use of the PC as the base register in any store instruction is deprecated in ARMv7. A5.1.4 The SP and the use of 0b1101 as a register specifier In ARM instructions, the use of 0b1101 as a register specifier specifies the SP. ARM deprecates: • using SP for any purpose other than as a stack pointer • using the SP in ARM instructions in ways other that those listed in 32-bit Thumb instruction support for R13 on page A6-4, except that ARM does not deprecate the use of instructions of the following form that write a word-aligned address to SP: SUB SP, , # ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-3 ARM Instruction Set Encoding A5.2 Data-processing and miscellaneous instructions 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 op op1 op2 Table A5-2 shows the allocation of encodings in this space. Table A5-2 Data-processing and miscellaneous instructions op op1 op2 Instruction or instruction class Variant 0 not 10xx0 xxx0 Data-processing (register) on page A5-5 - 0xx1 Data-processing (register-shifted register) on page A5-7 - 10xx0 0xxx Miscellaneous instructions on page A5-18 - 1xx0 Halfword multiply and multiply-accumulate on page A5-13 - 0xxxx 1001 Multiply and multiply-accumulate on page A5-12 - 1xxxx 1001 Synchronization primitives on page A5-16 - not 0xx1x 1011 Extra load/store instructions on page A5-14 - 11x1 Extra load/store instructions on page A5-14 - 0xx1x 1011 Extra load/store instructions (unprivileged) on page A5-15 - 11x1 Extra load/store instructions (unprivileged) on page A5-15 - 1 not 10xx0 - Data-processing (immediate) on page A5-8 - 10000 - 16-bit immediate load (MOV (immediate) on page A8-194) v6T2 10100 - High halfword 16-bit immediate load (MOVT on page A8-200) v6T2 10x10 - MSR (immediate), and hints on page A5-17 - A5-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding A5.2.1 Data-processing (register) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 op1 op2 op3 0 If op1 == 0b10xx0, see Data-processing and miscellaneous instructions on page A5-4. Table A5-3 shows the allocation of encodings in this space. These encodings are in all architecture variants. Table A5-3 Data-processing (register) instructions op1 op2 op3 Instruction See 0000x - - 0001x - - 0010x - - 0011x - - 0100x - - 0101x - - 0110x - - 0111x - - 10001 - - 10011 - - 10101 - - 10111 - - 1100x - - 1101x 00000 00 not 00000 00 - 01 - 10 00000 11 not 00000 11 Bitwise AND AND (register) on page A8-36 Bitwise Exclusive OR EOR (register) on page A8-96 Subtract SUB (register) on page A8-422 Reverse Subtract RSB (register) on page A8-286 Add ADD (register) on page A8-24 Add with Carry ADC (register) on page A8-16 Subtract with Carry SBC (register) on page A8-304 Reverse Subtract with Carry RSC (register) on page A8-292 Test TST (register) on page A8-456 Test Equivalence TEQ (register) on page A8-450 Compare CMP (register) on page A8-82 Compare Negative CMN (register) on page A8-76 Bitwise OR ORR (register) on page A8-230 Move MOV (register) on page A8-196 Logical Shift Left LSL (immediate) on page A8-178 Logical Shift Right LSR (immediate) on page A8-182 Arithmetic Shift Right ASR (immediate) on page A8-40 Rotate Right with Extend RRX on page A8-282 Rotate Right ROR (immediate) on page A8-278 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-5 ARM Instruction Set Encoding op1 op2 1110x 1111x - Table A5-3 Data-processing (register) instructions (continued) op3 Instruction See - Bitwise Bit Clear - Bitwise NOT BIC (register) on page A8-52 MVN (register) on page A8-216 A5-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding A5.2.2 Data-processing (register-shifted register) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 op1 0 op2 1 If op1 == 0b10xx0, see Data-processing and miscellaneous instructions on page A5-4. Table A5-4 shows the allocation of encodings in this space. These encodings are in all architecture variants. Table A5-4 Data-processing (register-shifted register) instructions op1 op2 Instruction See 0000x 0001x 0010x 0011x 0100x 0101x 0110x 0111x 10001 10011 10101 10111 1100x 1101x 00 01 10 11 1110x 1111x - Bitwise AND AND (register-shifted register) on page A8-38 Bitwise Exclusive OR EOR (register-shifted register) on page A8-98 Subtract SUB (register-shifted register) on page A8-424 Reverse Subtract RSB (register-shifted register) on page A8-288 Add ADD (register-shifted register) on page A8-26 Add with Carry ADC (register-shifted register) on page A8-18 Subtract with Carry SBC (register-shifted register) on page A8-306 Reverse Subtract with Carry RSC (register-shifted register) on page A8-294 Test TST (register-shifted register) on page A8-458 Test Equivalence TEQ (register-shifted register) on page A8-452 Compare CMP (register-shifted register) on page A8-84 Compare Negative CMN (register-shifted register) on page A8-78 Bitwise OR ORR (register-shifted register) on page A8-232 Logical Shift Left LSL (register) on page A8-180 Logical Shift Right LSR (register) on page A8-184 Arithmetic Shift Right ASR (register) on page A8-42 Rotate Right ROR (register) on page A8-280 Bitwise Bit Clear BIC (register-shifted register) on page A8-54 Bitwise NOT MVN (register-shifted register) on page A8-218 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-7 ARM Instruction Set Encoding A5.2.3 Data-processing (immediate) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 1 op Rn If op == 0b10xx0, see Data-processing and miscellaneous instructions on page A5-4. Table A5-5 shows the allocation of encodings in this space. These encodings are in all architecture variants. Table A5-5 Data-processing (immediate) instructions op Rn Instruction See 0000x - Bitwise AND AND (immediate) on page A8-34 0001x - Bitwise Exclusive OR EOR (immediate) on page A8-94 0010x not 1111 Subtract SUB (immediate, ARM) on page A8-420 1111 Form PC-relative address ADR on page A8-32 0011x - Reverse Subtract RSB (immediate) on page A8-284 0100x not 1111 Add ADD (immediate, ARM) on page A8-22 1111 Form PC-relative address ADR on page A8-32 0101x - Add with Carry ADC (immediate) on page A8-14 0110x - Subtract with Carry SBC (immediate) on page A8-302 0111x - Reverse Subtract with Carry RSC (immediate) on page A8-290 10001 - Test TST (immediate) on page A8-454 10011 - Test Equivalence TEQ (immediate) on page A8-448 10101 - Compare CMP (immediate) on page A8-80 10111 - Compare Negative CMN (immediate) on page A8-74 1100x - Bitwise OR ORR (immediate) on page A8-228 1101x - Move MOV (immediate) on page A8-194 1110x - Bitwise Bit Clear BIC (immediate) on page A8-50 1111x - Bitwise NOT MVN (immediate) on page A8-214 These instructions all have modified immediate constants, rather than a simple 12-bit binary number. This provides a more useful range of values. For details see Modified immediate constants in ARM instructions on page A5-9. A5-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding A5.2.4 Modified immediate constants in ARM instructions 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 rotation a b c d e f g h Table A5-6 shows the range of modified immediate constants available in ARM data-processing instructions, and how they are encoded in the a, b, c, d, e, f, g, h, and rotation fields in the instruction. Table A5-6 Encoding of modified immediates in ARM processing instructions rotation a 0000 00000000 00000000 00000000 abcdefgh 0001 gh000000 00000000 00000000 00abcdef 0010 efgh0000 00000000 00000000 0000abcd 0011 cdefgh00 00000000 00000000 000000ab 0100 abcdefgh 00000000 00000000 00000000 . . . . 8-bit values shifted to other even-numbered positions . . 1001 00000000 00abcdef gh000000 00000000 . . . . 8-bit values shifted to other even-numbered positions . . 1110 00000000 00000000 0000abcd efgh0000 1111 00000000 00000000 000000ab cdefgh00 a. In this table, the immediate constant value is shown in binary form, to relate abcdefgh to the encoding diagram. In assembly syntax, the immediate value is specified in the usual way (a decimal number by default). Note The range of values available in ARM modified immediate constants is slightly different from the range of values available in 32-bit Thumb instructions. See Modified immediate constants in Thumb instructions on page A6-17. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-9 ARM Instruction Set Encoding Carry out A logical instruction with rotation == 0b0000 does not affect APSR.C. Otherwise, a logical instruction that sets the flags sets APSR.C to the value of bit [31] of the modified immediate constant. Constants with multiple encodings Some constant values have multiple possible encodings. In this case, a UAL assembler must select the encoding with the lowest unsigned value of the rotation field. This is the encoding that appears first in Table A5-6 on page A5-9. For example, the constant #3 must be encoded with (rotation, abcdefgh) == (0b0000, 0b00000011), not (0b0001, 0b00001100), (0b0010, 0b00110000), or (0b0011, 0b11000000). In particular, this means that all constants in the range 0-255 are encoded with rotation == 0b0000, and permitted constants outside that range are encoded with rotation != 0b0000. A flag-setting logical instruction with a modified immediate constant therefore leaves APSR.C unchanged if the constant is in the range 0-255 and sets it to the most significant bit of the constant otherwise. This matches the behavior of Thumb modified immediate constants for all constants that are permitted in both the ARM and Thumb instruction sets. An alternative syntax is available for a modified immediate constant that permits the programmer to specify the encoding directly. In this syntax, # is instead written as #,#, where: is the numeric value of abcdefgh, in the range 0-255 is twice the numeric value of rotation, an even number in the range 0-30. This syntax permits all ARM data-processing instructions with modified immediate constants to be disassembled to assembler syntax that will assemble to the original instruction. This syntax also makes it possible to write variants of some flag-setting logical instructions that have different effects on APSR.C to those obtained with the normal # syntax. For example, ANDS R1,R2,#12,#2 has the same behavior as ANDS R1,R2,#3 except that it sets APSR.C to 0 instead of leaving it unchanged. Such variants of flag-setting logical instructions do not have equivalents in the Thumb instruction set, and their use is deprecated. Operation // ARMExpandImm() // ============== bits(32) ARMExpandImm(bits(12) imm12) // APSR.C argument to following function call does not affect the imm32 result. (imm32, -) = ARMExpandImm_C(imm12, APSR.C); return imm32; // ARMExpandImm_C() // ================ (bits(32), bit) ARMExpandImm_C(bits(12) imm12, bit carry_in) A5-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding unrotated_value = ZeroExtend(imm12<7:0>, 32); (imm32, carry_out) = Shift_C(unrotated_value, SRType_ROR, 2*UInt(imm12<11:8>), carry_in); return (imm32, carry_out); ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-11 ARM Instruction Set Encoding A5.2.5 Multiply and multiply-accumulate 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 0 op 1001 Table A5-7 shows the allocation of encodings in this space. Table A5-7 Multiply and multiply-accumulate instructions op Instruction See Variant 000x Multiply MUL on page A8-212 All 001x Multiply Accumulate MLA on page A8-190 All 0100 Unsigned Multiply Accumulate Accumulate Long UMAAL on page A8-482 v6 0101 UNDEFINED - - 0110 Multiply and Subtract MLS on page A8-192 v6T2 0111 UNDEFINED - - 100x Unsigned Multiply Long UMULL on page A8-486 All 101x Unsigned Multiply Accumulate Long UMLAL on page A8-484 All 110x Signed Multiply Long SMULL on page A8-356 All 111x Signed Multiply Accumulate Long SMLAL on page A8-334 All A5-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding A5.2.6 Saturating addition and subtraction 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 1 0 op 0 0101 Table A5-8 shows the allocation of encodings in this space. These encodings are all available in ARMv5TE and above, and are UNDEFINED in earlier variants of the architecture. Table A5-8 Saturating addition and subtraction instructions op Instruction See 00 Saturating Add QADD on page A8-250 01 Saturating Subtract QSUB on page A8-264 10 Saturating Double and Add QDADD on page A8-258 11 Saturating Double and Subtract QDSUB on page A8-260 A5.2.7 Halfword multiply and multiply-accumulate 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 1 0 op1 0 1 op 0 Table A5-9 shows the allocation of encodings in this space. These encodings are signed multiply (SMUL) and signed multiply-accumulate (SMLA) instructions, operating on 16-bit values, or mixed 16-bit and 32-bit values. The results and accumulators are 32-bit or 64-bit. These encodings are all available in ARMv5TE and above, and are UNDEFINED in earlier variants of the architecture. Table A5-9 Halfword multiply and multiply-accumulate instructions op1 op Instruction See 00 - Signed 16-bit multiply, 32-bit accumulate SMLABB, SMLABT, SMLATB, SMLATT on page A8-330 01 0 Signed 16-bit x 32-bit multiply, 32-bit accumulate SMLAWB, SMLAWT on page A8-340 01 1 Signed 16-bit x 32-bit multiply, 32-bit result SMULWB, SMULWT on page A8-358 10 - Signed 16-bit multiply, 64-bit accumulate SMLALBB, SMLALBT, SMLALTB, SMLALTT on page A8-336 11 - Signed 16-bit multiply, 32-bit result SMULBB, SMULBT, SMULTB, SMULTT on page A8-354 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-13 ARM Instruction Set Encoding A5.2.8 Extra load/store instructions 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 op1 Rn 1 op2 1 If op1 == 0b0xx1x or op2 == 0b00, see Data-processing and miscellaneous instructions on page A5-4. Table A5-10 shows the allocation of encodings in this space. Table A5-10 Extra load/store instructions op2 op1 Rn Instruction See Variant 01 xx0x0 - Store Halfword STRH (register) on page A8-412 All xx0x1 - Load Halfword LDRH (register) on page A8-156 All xx1x0 - Store Halfword STRH (immediate, ARM) on page A8-410 All xx1x1 not 1111 Load Halfword LDRH (immediate, ARM) on page A8-152 All 1111 Load Halfword LDRH (literal) on page A8-154 All 10 xx0x0 - Load Dual LDRD (register) on page A8-140 v5TE xx0x1 - Load Signed Byte LDRSB (register) on page A8-164 All xx1x0 not 1111 Load Dual LDRD (immediate) on page A8-136 v5TE 1111 Load Dual LDRD (literal) on page A8-138 v5TE xx1x1 not 1111 Load Signed Byte LDRSB (immediate) on page A8-160 All 1111 Load Signed Byte LDRSB (literal) on page A8-162 All 11 xx0x0 - Store Dual STRD (register) on page A8-398 All xx0x1 - Load Signed Halfword LDRSH (register) on page A8-172 All xx1x0 - Store Dual STRD (immediate) on page A8-396 All xx1x1 not 1111 Load Signed Halfword LDRSH (immediate) on page A8-168 All 1111 Load Signed Halfword LDRSH (literal) on page A8-170 All A5-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding A5.2.9 Extra load/store instructions (unprivileged) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 0 1 op Rt 1 op2 1 If op2 == 0b00, see Data-processing and miscellaneous instructions on page A5-4. Table A5-11 shows the allocation of encodings in this space. The instruction encodings are all available in ARMv6T2 and above, and are UNDEFINED in earlier variants of the architecture. Table A5-11 Extra load/store instructions (unprivileged) op2 op Rt Instruction See 01 0 - Store Halfword Unprivileged STRHT on page A8-414 1- Load Halfword Unprivileged LDRHT on page A8-158 1x 0 xxx0 UNPREDICTABLE - xxx1 UNDEFINED - 10 1 - Load Signed Byte Unprivileged LDRSBT on page A8-166 11 1 - Load Signed Halfword Unprivileged LDRSHT on page A8-174 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-15 ARM Instruction Set Encoding A5.2.10 Synchronization primitives 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 1 op 1001 Table A5-12 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A5-12 Synchronization primitives op Instruction See Variant 0x00 Swap Word, Swap Byte SWP, SWPB on page A8-432 a All 1000 Store Register Exclusive STREX on page A8-400 v6 1001 Load Register Exclusive LDREX on page A8-142 v6 1010 Store Register Exclusive Doubleword STREXD on page A8-404 v6K 1011 Load Register Exclusive Doubleword LDREXD on page A8-146 v6K 1100 Store Register Exclusive Byte STREXB on page A8-402 v6K 1101 Load Register Exclusive Byte LDREXB on page A8-144 v6K 1110 Store Register Exclusive Halfword STREXH on page A8-406 v6K 1111 Load Register Exclusive Halfword LDREXH on page A8-148 v6K a. Use of these instructions is deprecated. A5-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding A5.2.11 MSR (immediate), and hints 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 1 1 0 op 1 0 op1 op2 Table A5-13 shows the allocation of encodings in this space. Other encodings in this space are unallocated hints. They execute as NOPs, but software must not use them. Table A5-13 MSR (immediate), and hints op op1 op2 Instruction See Variant 0 0000 00000000 No Operation hint NOP on page A8-222 v6K, v6T2 00000001 Yield hint YIELD on page A8-812 v6K 00000010 Wait For Event hint WFE on page A8-808 v6K 00000011 Wait For Interrupt hint WFI on page A8-810 v6K 00000100 Send Event hint SEV on page A8-316 v6K 1111xxxx Debug hint DBG on page A8-88 v7 0100 1x00 - Move to Special Register, application level MSR (immediate) on page A8-208 All xx01 xx1x - Move to Special Register, system MSR (immediate) on page B6-12 All level 1- - Move to Special Register, system MSR (immediate) on page B6-12 All level ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-17 ARM Instruction Set Encoding A5.2.12 Miscellaneous instructions 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 1 0 op 0 op1 0 op2 Table A5-14 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A5-14 Miscellaneous instructions op2 op op1 Instruction or instruction class See Variant 000 x0 xxxx Move Special Register to Register MRS on page A8-206 All MRS on page B6-10 01 xx00 Move to Special Register, application level MSR (register) on page A8-210 All xx01 Move to Special Register, system level xx1x MSR (register) on page B6-14 All 11 - Move to Special Register, system level MSR (register) on page B6-14 All 001 01 - Branch and Exchange BX on page A8-62 v4T 11 - Count Leading Zeros CLZ on page A8-72 v6 010 01 - Branch and Exchange Jazelle BXJ on page A8-64 v5TEJ 011 01 - Branch with Link and Exchange BLX (register) on page A8-60 v5T 101 - - Saturating addition and subtraction Saturating addition and - subtraction on page A5-13 111 01 - Breakpoint BKPT on page A8-56 v5T 11 - Secure Monitor Call SMC (previously SMI) on page B6-18 Security Extensions A5-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding A5.3 Load/store word and unsigned byte 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 1 A op1 Rn B These instructions have either A == 0 or B == 0. For instructions with A == 1 and B == 1, see Media instructions on page A5-21. Table A5-15 shows the allocation of encodings in this space. These encodings are in all architecture variants. Table A5-15 Single data transfer instructions A op1 B Rn Instruction See 0 xx0x0 not 0x010 - - Store Register STR (immediate, ARM) on page A8-384 1 xx0x0 not 0x010 0 - Store Register STR (register) on page A8-386 0 0x010 -- Store Register Unprivileged STRT on page A8-416 1 0x010 0- 0 xx0x1 not 0x011 - not 1111 Load Register (immediate) LDR (immediate, ARM) on page A8-120 xx0x1 not 0x011 - 1111 Load Register (literal) LDR (literal) on page A8-122 1 xx0x1 not 0x011 0 - Load Register LDR (register) on page A8-124 0 0x011 -- Load Register Unprivileged LDRT on page A8-176 1 0x011 0- 0 xx1x0 not 0x110 - - Store Register Byte (immediate) STRB (immediate, ARM) on page A8-390 1 xx1x0 not 0x110 0 - Store Register Byte (register) STRB (register) on page A8-392 0 0x110 -- Store Register Byte Unprivileged STRBT on page A8-394 1 0x110 0- 0 xx1x1 not 0x111 - not 1111 Load Register Byte (immediate) LDRB (immediate, ARM) on page A8-128 xx1x1 not 0x111 - 1111 Load Register Byte (literal) LDRB (literal) on page A8-130 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-19 ARM Instruction Set Encoding A op1 B Rn 1 xx1x1 not 0x111 0 - 0 0x111 -- 1 0x111 0- Table A5-15 Single data transfer instructions (continued) Instruction See Load Register Byte (register) LDRB (register) on page A8-132 Load Register Byte Unprivileged LDRBT on page A8-134 A5-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding A5.4 Media instructions 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 1 1 op1 Rd op2 1 Rn Table A5-16 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A5-16 Media instructions op1 op2 Rd Rn Instructions See Variant 000xx - 001xx - 01xxx - 10xxx 11000 000 000 1101x x10 1110x x00 1111x x10 11111 111 - - - Parallel addition and - subtraction, signed on page A5-22 - - - Parallel addition and - subtraction, unsigned on page A5-23 - - - Packing, unpacking, - saturation, and reversal on page A5-24 - - - Signed multiplies on - page A5-26 1111 - Unsigned Sum of Absolute USAD8 on page A8-500 v6 Differences not 1111 - Unsigned Sum of Absolute USADA8 on page A8-502 v6 Differences and Accumulate - - Signed Bit Field Extract SBFX on page A8-308 v6T2 - 1111 Bit Field Clear BFC on page A8-46 v6T2 - not 1111 Bit Field Insert BFI on page A8-48 v6T2 - - Unsigned Bit Field Extract UBFX on page A8-466 v6T2 - - Permanently UNDEFINED. This space will not be allocated in future. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-21 ARM Instruction Set Encoding A5.4.1 Parallel addition and subtraction, signed 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 1 1 0 0 0 op1 op2 1 Table A5-17 shows the allocation of encodings in this space. These encodings are all available in ARMv6 and above, and are UNDEFINED in earlier variants of the architecture. Other encodings in this space are UNDEFINED. Table A5-17 Signed parallel addition and subtraction instructions op1 op2 Instruction See 01 000 Add 16-bit SADD16 on page A8-296 01 001 Add and Subtract with Exchange SASX on page A8-300 01 010 Subtract and Add with Exchange SSAX on page A8-366 01 011 Subtract 16-bit SSUB16 on page A8-368 01 100 Add 8-bit SADD8 on page A8-298 01 111 Subtract 8-bit SSUB8 on page A8-370 Saturating instructions 10 000 Saturating Add 16-bit QADD16 on page A8-252 10 001 Saturating Add and Subtract with Exchange QASX on page A8-256 10 010 Saturating Subtract and Add with Exchange QSAX on page A8-262 10 011 Saturating Subtract 16-bit QSUB16 on page A8-266 10 100 Saturating Add 8-bit QADD8 on page A8-254 10 111 Saturating Subtract 8-bit QSUB8 on page A8-268 Halving instructions 11 000 Halving Add 16-bit SHADD16 on page A8-318 11 001 Halving Add and Subtract with Exchange SHASX on page A8-322 11 010 Halving Subtract and Add with Exchange SHSAX on page A8-324 11 011 Halving Subtract 16-bit SHSUB16 on page A8-326 11 100 Halving Add 8-bit SHADD8 on page A8-320 11 111 Halving Subtract 8-bit SHSUB8 on page A8-328 A5-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding A5.4.2 Parallel addition and subtraction, unsigned 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 1 1 0 0 1 op1 op2 1 Table A5-18 shows the allocation of encodings in this space. These encodings are all available in ARMv6 and above, and are UNDEFINED in earlier variants of the architecture. Other encodings in this space are UNDEFINED. Table A5-18 Unsigned parallel addition and subtractions instructions op1 op2 Instruction See 01 000 Add 16-bit UADD16 on page A8-460 01 001 Add and Subtract with Exchange UASX on page A8-464 01 010 Subtract and Add with Exchange USAX on page A8-508 01 011 Subtract 16-bit USUB16 on page A8-510 01 100 Add 8-bit UADD8 on page A8-462 01 111 Subtract 8-bit USUB8 on page A8-512 Saturating instructions 10 000 Saturating Add 16-bit UQADD16 on page A8-488 10 001 Saturating Add and Subtract with Exchange UQASX on page A8-492 10 010 Saturating Subtract and Add with Exchange UQSAX on page A8-494 10 011 Saturating Subtract 16-bit UQSUB16 on page A8-496 10 100 Saturating Add 8-bit UQADD8 on page A8-490 10 111 Saturating Subtract 8-bit UQSUB8 on page A8-498 Halving instructions 11 000 Halving Add 16-bit UHADD16 on page A8-470 11 001 Halving Add and Subtract with Exchange UHASX on page A8-474 11 010 Halving Subtract and Add with Exchange UHSAX on page A8-476 11 011 Halving Subtract 16-bit UHSUB16 on page A8-478 11 100 Halving Add 8-bit UHADD8 on page A8-472 11 111 Halving Subtract 8-bit UHSUB8 on page A8-480 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-23 ARM Instruction Set Encoding A5.4.3 Packing, unpacking, saturation, and reversal 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 1 1 0 1 op1 A op2 1 Table A5-19 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A5-19 Packing, unpacking, saturation, and reversal instructions op1 op2 A Instructions 000 xx0 - Pack Halfword 01x xx0 - Signed Saturate 11x xx0 - Unsigned Saturate 000 011 not 1111 Signed Extend and Add Byte 16 1111 Signed Extend Byte 16 101 - Select Bytes 010 001 - Signed Saturate 16 011 not 1111 Signed Extend and Add Byte 1111 Signed Extend Byte 011 001 - Byte-Reverse Word 011 not 1111 Signed Extend and Add Halfword 1111 Signed Extend Halfword 011 101 - Byte-Reverse Packed Halfword 100 011 not 1111 Unsigned Extend and Add Byte 16 1111 Unsigned Extend Byte 16 110 001 - Unsigned Saturate 16 011 not 1111 Unsigned Extend and Add Byte 1111 Unsigned Extend Byte See Variant PKH on page A8-234 v6 SSAT on page A8-362 v6 USAT on page A8-504 v6 SXTAB16 on page A8-436 v6 SXTB16 on page A8-442 v6 SEL on page A8-312 v6 SSAT16 on page A8-364 v6 SXTAB on page A8-434 v6 SXTB on page A8-440 v6 REV on page A8-272 v6 SXTAH on page A8-438 v6 SXTH on page A8-444 v6 REV16 on page A8-274 v6 UXTAB16 on page A8-516 v6 UXTB16 on page A8-522 v6 USAT16 on page A8-506 v6 UXTAB on page A8-514 v6 UXTB on page A8-520 v6 A5-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding Table A5-19 Packing, unpacking, saturation, and reversal instructions (continued) op1 op2 A Instructions See Variant 111 001 - Reverse Bits RBIT on page A8-270 011 not 1111 Unsigned Extend and Add Halfword UXTAH on page A8-518 1111 Unsigned Extend Halfword UXTH on page A8-524 101 - Byte-Reverse Signed Halfword REVSH on page A8-276 v6T2 v6 v6 v6 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-25 ARM Instruction Set Encoding A5.4.4 Signed multiplies 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 1 1 1 0 op1 A op2 1 Table A5-20 shows the allocation of encodings in this space. These encodings are all available in ARMv6T2 and above, and are UNDEFINED in earlier variants of the architecture. Other encodings in this space are UNDEFINED. Table A5-20 Signed multiply instructions op1 op2 A Instruction See 000 00x not 1111 Signed Multiply Accumulate Dual SMLAD on page A8-332 1111 Signed Dual Multiply Add SMUAD on page A8-352 01x not 1111 Signed Multiply Subtract Dual SMLSD on page A8-342 1111 Signed Dual Multiply Subtract SMUSD on page A8-360 100 00x - Signed Multiply Accumulate Long Dual SMLALD on page A8-338 01x - Signed Multiply Subtract Long Dual SMLSLD on page A8-344 101 00x not 1111 Signed Most Significant Word Multiply Accumulate SMMLA on page A8-346 1111 Signed Most Significant Word Multiply SMMUL on page A8-350 11x - Signed Most Significant Word Multiply Subtract SMMLS on page A8-348 A5-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B A5.5 ARM Instruction Set Encoding Rn Branch, branch with link, and block data transfer 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 1 0 op R Table A5-21 shows the allocation of encodings in this space. These encodings are in all architecture variants. Table A5-21 Branch, branch with link, and block data transfer instructions op R Instructions See 0000x0 0000x1 0010x0 0010x1 0100x0 0100x1 0110x0 0110x1 0xx1x0 0xx1x1 0 1 10xxxx 11xxxx - Store Multiple Decrement After STMDA / STMED on page A8-376 Load Multiple Decrement After LDMDA / LDMFA on page A8-112 Store Multiple (Increment After) STM / STMIA / STMEA on page A8-374 Load Multiple (Increment After) LDM / LDMIA / LDMFD on page A8-110 Store Multiple Decrement Before STMDB / STMFD on page A8-378 Load Multiple Decrement Before LDMDB / LDMEA on page A8-114 Store Multiple Increment Before STMIB / STMFA on page A8-380 Load Multiple Increment Before LDMIB / LDMED on page A8-116 Store Multiple (user registers) STM (user registers) on page B6-22 Load Multiple (user registers) LDM (user registers) on page B6-7 Load Multiple (exception return) LDM (exception return) on page B6-5 Branch B on page A8-44 Branch with Link BL, BLX (immediate) on page A8-58 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-27 ARM Instruction Set Encoding A5.6 Supervisor Call, and coprocessor instructions 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 1 1 op1 Rn coproc op Table A5-22 shows the allocation of encodings in this space. Table A5-22 Supervisor Call, and coprocessor instructions op1 op coproc Rn Instructions See Variant 0xxxxxa 0xxxx0a 0xxxx1a - 00000x 00010x 000100 000101 10xxxx 0 1 10xxx0 1 101x - Advanced SIMD, VFP Extension register load/store instructions on page A7-26 not 101x - Store Coprocessor STC, STC2 on page A8-372 All not 101x not 1111 Load Coprocessor LDC, LDC2 (immediate) on All page A8-106 1111 Load Coprocessor LDC, LDC2 (literal) on All page A8-108 - - UNDEFINED - - 101x - Advanced SIMD, VFP 64-bit transfers between ARM core and extension registers on page A7-32 not 101x - Move to Coprocessor from MCRR, MCRR2 on two ARM core registers page A8-188 v5TE not 101x - Move to two ARM core MRRC, MRRC2 on registers from Coprocessor page A8-204 v5TE 101x - - VFP data-processing instructions on page A7-24 not 101x - Coprocessor data operations CDP, CDP2 on page A8-68 All 101x - Advanced SIMD, VFP 8, 16, and 32-bit transfer between ARM core and extension registers on page A7-31 not 101x - Move to Coprocessor from MCR, MCR2 on All ARM core register page A8-186 A5-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding Table A5-22 Supervisor Call, and coprocessor instructions (continued) op1 op coproc Rn Instructions See Variant 10xxx1 1 not 101x - 11xxxx - - - a. But not 000x0x Move to ARM core register MRC, MRC2 on All from Coprocessor page A8-202 Supervisor Call SVC (previously SWI) on All page A8-430 For more information about specific coprocessors see Coprocessor support on page A2-68. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-29 ARM Instruction Set Encoding A5.7 Unconditional instructions 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1111 op1 Rn op Table A5-23 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED in ARMv5 and above. All encodings in this space are UNPREDICTABLE in ARMv4 and ARMv4T. Table A5-23 Unconditional instructions op1 op Rn Instruction See Variant 0xxxxxxx - 100xx1x0 100xx0x1 101xxxxx - 11000x11 - 11001xx1 1101xxx1 11000x10 11001xx0 1101xxx0 11000100 - 11000101 - 1110xxxx 0 1110xxx0 1 1110xxx1 1 - - Miscellaneous instructions, memory hints, and Advanced SIMD instructions on page A5-31 - Store Return State SRS on page B6-20 v6 - Return From Exception RFE on page B6-16 v6 - Branch with Link and Exchange BL, BLX (immediate) on v5 page A8-58 not 1111 Load Coprocessor (immediate) LDC, LDC2 (immediate) on v5 page A8-106 1111 Load Coprocessor (literal) LDC, LDC2 (literal) on v5 1111 page A8-108 - Store Coprocessor STC, STC2 on page A8-372 v5 - Move to Coprocessor from two MCRR, MCRR2 on page A8-188 v6 ARM core registers - Move to two ARM core registers MRRC, MRRC2 on page A8-204 v6 from Coprocessor - Coprocessor data operations CDP, CDP2 on page A8-68 v5 - Move to Coprocessor from MCR, MCR2 on page A8-186 v5 ARM core register - Move to ARM core register from MRC, MRC2 on page A8-202 v5 Coprocessor A5-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ARM Instruction Set Encoding A5.7.1 Miscellaneous instructions, memory hints, and Advanced SIMD instructions 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 11110 op1 Rn op2 Table A5-24 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED in ARMv5 and above. All these encodings are UNPREDICTABLE in ARMv4 and ARMv4T. Table A5-24 Hints, and Advanced SIMD instructions op1 op2 Rn Instruction See Variant 0010000 xx0x xxx0 Change Processor State CPS on page B6-3 v6 0010000 0000 xxx1 Set Endianness SETEND on page A8-314 v6 01xxxxx - - See Advanced SIMD data-processing instructions on page A7-10 v7 100xxx0 - - See Advanced SIMD element or structure load/store instructions on v7 page A7-27 100x001 - - Unallocated memory hint (treat as NOP) MP a Extensions 100x101 - - Preload Instruction PLI (immediate, literal) on v7 page A8-242 101x001 - not 1111 Preload Data with intent to Write PLD, PLDW (immediate) on page A8-236 MP a Extensions 1111 UNPREDICTABLE - - 101x101 - not 1111 Preload Data PLD, PLDW (immediate) on page A8-236 v5TE 1111 Preload Data PLD (literal) on page A8-238 v5TE 1010111 0001 - Clear-Exclusive CLREX on page A8-70 v6K 0100 - Data Synchronization Barrier DSB on page A8-92 v6T2 0101 - Data Memory Barrier DMB on page A8-90 v7 0110 - Instruction Synchronization ISB on page A8-102 Barrier v6T2 10xxx11 - - UNPREDICTABLE except as shown above - ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A5-31 ARM Instruction Set Encoding Table A5-24 Hints, and Advanced SIMD instructions (continued) op1 op2 Rn Instruction See Variant 110x001 xxx0 - Unallocated memory hint (treat as NOP) 110x101 xxx0 111x001 xxx0 - 111x101 xxx0 - Preload Instruction Preload Data with intent to Write Preload Data 11xxx11 xxx0 - UNPREDICTABLE a. Multiprocessing Extensions. PLI (register) on page A8-244 PLD, PLDW (register) on page A8-240 PLD, PLDW (register) on page A8-240 - MP a Extensions v7 MP a Extensions v5TE - A5-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter A6 Thumb Instruction Set Encoding This chapter introduces the Thumb instruction set and describes how it uses the ARM programmers’ model. It contains the following sections: • Thumb instruction set encoding on page A6-2 • 16-bit Thumb instruction encoding on page A6-6 • 32-bit Thumb instruction encoding on page A6-14. For details of the differences between the Thumb and ThumbEE instruction sets see Chapter A9 ThumbEE. Note • Architecture variant information in this chapter describes the architecture variant or extension in which the instruction encoding was introduced into the Thumb instruction set. • In the decode tables in this chapter, an entry of - for a field value means the value of the field does not affect the decoding. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-1 Thumb Instruction Set Encoding A6.1 Thumb instruction set encoding The Thumb instruction stream is a sequence of halfword-aligned halfwords. Each Thumb instruction is either a single 16-bit halfword in that stream, or a 32-bit instruction consisting of two consecutive halfwords in that stream. If bits [15:11] of the halfword being decoded take any of the following values, the halfword is the first halfword of a 32-bit instruction: • 0b11101 • 0b11110 • 0b11111. Otherwise, the halfword is a 16-bit instruction. For details of the encoding of 16-bit Thumb instructions see 16-bit Thumb instruction encoding on page A6-6. For details of the encoding of 32-bit Thumb instructions see 32-bit Thumb instruction encoding on page A6-14. A6.1.1 UNDEFINED and UNPREDICTABLE instruction set space An attempt to execute an unallocated instruction results in either: • Unpredictable behavior. The instruction is described as UNPREDICTABLE. • An Undefined Instruction exception. The instruction is described as UNDEFINED. An instruction is UNDEFINED if it is declared as UNDEFINED in an instruction description, or in this chapter. An instruction is UNPREDICTABLE if: • a bit marked (0) or (1) in the encoding diagram of an instruction is not 0 or 1 respectively, and the pseudocode for that encoding does not indicate that a different special case applies • it is declared as UNPREDICTABLE in an instruction description or in this chapter. Unless otherwise specified: • Thumb instructions introduced in an architecture variant are either UNPREDICTABLE or UNDEFINED in earlier architecture variants. • A Thumb instruction that is provided by one or more of the architecture extensions is either UNPREDICTABLE or UNDEFINED in an implementation that does not include any of those extensions. In both cases, the instruction is UNPREDICTABLE if it is a 32-bit instruction in an architecture variant before ARMv6T2, and UNDEFINED otherwise. A6-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.1.2 Use of 0b1111 as a register specifier The use of 0b1111 as a register specifier is not normally permitted in Thumb instructions. When a value of 0b1111 is permitted, a variety of meanings is possible. For register reads, these meanings are: • Read the PC value, that is, the address of the current instruction + 4. The base register of the table branch instructions TBB and TBH can be the PC. This enables branch tables to be placed in memory immediately after the instruction. Note Use of the PC as the base register in the STC instruction is deprecated in ARMv7. • Read the word-aligned PC value, that is, the address of the current instruction + 4, with bits [1:0] forced to zero. The base register of LDC, LDR, LDRB, LDRD (pre-indexed, no writeback), LDRH, LDRSB, and LDRSH instructions can be the word-aligned PC. This enables PC-relative data addressing. In addition, some encodings of the ADD and SUB instructions permit their source registers to be 0b1111 for the same purpose. • Read zero. This is done in some cases when one instruction is a special case of another, more general instruction, but with one operand zero. In these cases, the instructions are listed on separate pages, with a special case in the pseudocode for the more general instruction cross-referencing the other page. For register writes, these meanings are: • The PC can be specified as the destination register of an LDR instruction. This is done by encoding Rt as 0b1111. The loaded value is treated as an address, and the effect of execution is a branch to that address. bit [0] of the loaded value selects whether to execute ARM or Thumb instructions after the branch. Some other instructions write the PC in similar ways, either implicitly (for example branch instructions) or by using a register mask rather than a register specifier (LDM). The address to branch to can be: — a loaded value, for example, RFE — a register value, for example, BX — the result of a calculation, for example, TBB or TBH. The method of choosing the instruction set used after the branch can be: — similar to the LDR case, for LDM or BX — a fixed instruction set other than the one currently being used, for example, the immediate form of BLX — unchanged, for example branch instructions — set from the (J,T) bits of the SPSR, for RFE and SUBS PC,LR,#imm8. • Discard the result of a calculation. This is done in some cases when one instruction is a special case of another, more general instruction, but with the result discarded. In these cases, the instructions are listed on separate pages, with a special case in the pseudocode for the more general instruction cross-referencing the other page. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-3 Thumb Instruction Set Encoding • If the destination register specifier of an LDRB, LDRH, LDRSB, or LDRSH instruction is 0b1111, the instruction is a memory hint instead of a load operation. • If the destination register specifier of an MRC instruction is 0b1111, bits [31:28] of the value transferred from the coprocessor are written to the N, Z, C, and V flags in the APSR, and bits [27:0] are discarded. A6.1.3 Use of 0b1101 as a register specifier R13 is defined in the Thumb instruction set so that its use is primarily as a stack pointer, and R13 is normally identified as SP in Thumb instructions. In 32-bit Thumb instructions, if you use R13 as a general-purpose register beyond the architecturally defined constraints described in this section, the results are UNPREDICTABLE. The restrictions applicable to R13 are described in: • R13[1:0] definition • 32-bit Thumb instruction support for R13. See also 16-bit Thumb instruction support for R13 on page A6-5. R13[1:0] definition Bits [1:0] of R13 are SBZP. Writing a nonzero value to bits [1:0] causes UNPREDICTABLE behavior. 32-bit Thumb instruction support for R13 R13 instruction support is restricted to the following: • R13 as the source or destination register of a MOV instruction. Only register to register transfers without shifts are supported, with no flag setting: MOV SP, MOV ,SP • Using the following instructions to adjust R13 up or down by a multiple of 4: ADD{W} SUB{W} ADD ADD SUB SUB SP,SP,# SP,SP,# SP,SP, SP,SP,,LSL # SP,SP, SP,SP,,LSL # ; For = 1,2,3 ; For = 1,2,3 • R13 as a base register of any load/store instruction. This supports SP-based addressing for load, store, or memory hint instructions, with positive or negative offsets, with and without writeback. • R13 as the first operand in any ADD{S}, CMN, CMP, or SUB{S} instruction. The add and subtract instructions support SP-based address generation, with the address going into a general-purpose register. CMN and CMP are useful for stack checking in some circumstances. • R13 as the transferred register in any LDR or STR instruction. A6-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding 16-bit Thumb instruction support for R13 For 16-bit data-processing instructions that affect high registers, R13 can only be used as described in 32-bit Thumb instruction support for R13 on page A6-4. Any other use is deprecated. This affects the high register forms of CMP and ADD, where the use of R13 as is deprecated. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-5 Thumb Instruction Set Encoding A6.2 16-bit Thumb instruction encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Opcode Table A6-1 shows the allocation of 16-bit instruction encodings. Table A6-1 16-bit Thumb instruction encoding Opcode Instruction or instruction class Variant 00xxxx Shift (immediate), add, subtract, move, and compare on page A6-7 - 010000 Data-processing on page A6-8 - 010001 Special data instructions and branch and exchange on page A6-9 - 01001x Load from Literal Pool, see LDR (literal) on page A8-122 v4T 0101xx Load/store single data item on page A6-10 - 011xxx 100xxx 10100x Generate PC-relative address, see ADR on page A8-32 v4T 10101x Generate SP-relative address, see ADD (SP plus immediate) on page A8-28 v4T 1011xx Miscellaneous 16-bit instructions on page A6-11 - 11000x Store multiple registers, see STM / STMIA / STMEA on page A8-374 a v4T 11001x Load multiple registers, see LDM / LDMIA / LDMFD on page A8-110 a v4T 1101xx Conditional branch, and Supervisor Call on page A6-13 - 11100x Unconditional Branch, see B on page A8-44 v4T a. In ThumbEE, 16-bit load/store multiple instructions are not available. This encoding is used for special ThumbEE instructions. For details see Chapter A9 ThumbEE. A6-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.2.1 Shift (immediate), add, subtract, move, and compare 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 Opcode Table A6-2 shows the allocation of encodings in this space. All these instructions are available since the Thumb instruction set was introduced in ARMv4T. Table A6-2 16-bit Thumb shift (immediate), add, subtract, move, and compare instructions Opcode Instruction See 000xx 001xx 010xx 01100 01101 01110 01111 100xx 101xx 110xx 111xx Logical Shift Left LSL (immediate) on page A8-178 Logical Shift Right LSR (immediate) on page A8-182 Arithmetic Shift Right ASR (immediate) on page A8-40 Add register ADD (register) on page A8-24 Subtract register SUB (register) on page A8-422 Add 3-bit immediate ADD (immediate, Thumb) on page A8-20 Subtract 3-bit immediate SUB (immediate, Thumb) on page A8-418 Move MOV (immediate) on page A8-194 Compare CMP (immediate) on page A8-80 Add 8-bit immediate ADD (immediate, Thumb) on page A8-20 Subtract 8-bit immediate SUB (immediate, Thumb) on page A8-418 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-7 Thumb Instruction Set Encoding A6.2.2 Data-processing 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 0 0 0 0 Opcode Table A6-3 shows the allocation of encodings in this space. All these instructions are available since the Thumb instruction set was introduced in ARMv4T. Table A6-3 16-bit Thumb data-processing instructions Opcode Instruction See 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Bitwise AND AND (register) on page A8-36 Bitwise Exclusive OR EOR (register) on page A8-96 Logical Shift Left LSL (register) on page A8-180 Logical Shift Right LSR (register) on page A8-184 Arithmetic Shift Right ASR (register) on page A8-42 Add with Carry ADC (register) on page A8-16 Subtract with Carry SBC (register) on page A8-304 Rotate Right ROR (register) on page A8-280 Test TST (register) on page A8-456 Reverse Subtract from 0 RSB (immediate) on page A8-284 Compare High Registers CMP (register) on page A8-82 Compare Negative CMN (register) on page A8-76 Bitwise OR ORR (register) on page A8-230 Multiply Two Registers MUL on page A8-212 Bitwise Bit Clear BIC (register) on page A8-52 Bitwise NOT MVN (register) on page A8-216 A6-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.2.3 Special data instructions and branch and exchange 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 0 0 0 1 Opcode Table A6-4 shows the allocation of encodings in this space. Table A6-4 16-bit Thumb special data instructions and branch and exchange Opcode Instruction See Variant 0000 Add Low Registers ADD (register) on page A8-24 v6T2 a 0001 001x Add High Registers ADD (register) on page A8-24 v4T 0100 UNPREDICTABLE - - 0101 011x Compare High Registers CMP (register) on page A8-82 v4T 1000 Move Low Registers MOV (register) on page A8-196 v6 a 1001 101x Move High Registers MOV (register) on page A8-196 v4T 110x Branch and Exchange BX on page A8-62 v4T 111x Branch with Link and Exchange BLX (register) on page A8-60 v5T a a. UNPREDICTABLE in earlier variants. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-9 Thumb Instruction Set Encoding A6.2.4 Load/store single data item 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 opA opB These instructions have one of the following values in opA: • 0b0101 • 0b011x • 0b100x. Table A6-5 shows the allocation of encodings in this space. All these instructions are available since the Thumb instruction set was introduced in ARMv4T. Table A6-5 16-bit Thumb Load/store instructions opA opB Instruction See 0101 000 001 010 011 100 101 110 111 0110 0xx 1xx 0111 0xx 1xx 1000 0xx 1xx 1001 0xx 1xx Store Register STR (register) on page A8-386 Store Register Halfword STRH (register) on page A8-412 Store Register Byte STRB (register) on page A8-392 Load Register Signed Byte LDRSB (register) on page A8-164 Load Register LDR (register) on page A8-124 Load Register Halfword LDRH (register) on page A8-156 Load Register Byte LDRB (register) on page A8-132 Load Register Signed Halfword LDRSH (register) on page A8-172 Store Register STR (immediate, Thumb) on page A8-382 Load Register LDR (immediate, Thumb) on page A8-118 Store Register Byte STRB (immediate, Thumb) on page A8-388 Load Register Byte LDRB (immediate, Thumb) on page A8-126 Store Register Halfword STRH (immediate, Thumb) on page A8-408 Load Register Halfword LDRH (immediate, Thumb) on page A8-150 Store Register SP relative STR (immediate, Thumb) on page A8-382 Load Register SP relative LDR (immediate, Thumb) on page A8-118 A6-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.2.5 Miscellaneous 16-bit instructions 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1011 Opcode Table A6-6 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A6-6 Miscellaneous 16-bit instructions Opcode Instruction See Variant 0110010 0110011 00000xx 00001xx 0001xxx 001000x 001001x 001010x 001011x 0011xxx 010xxxx 1001xxx 101000x 101001x 101011x 1011xxx 110xxxx 1110xxx 1111xxx Set Endianness SETEND on page A8-314 v6 Change Processor State CPS on page B6-3 v6 Add Immediate to SP ADD (SP plus immediate) on page A8-28 v4T Subtract Immediate from SP SUB (SP minus immediate) on page A8-426 v4T Compare and Branch on Zero CBNZ, CBZ on page A8-66 v6T2 Signed Extend Halfword SXTH on page A8-444 v6 Signed Extend Byte SXTB on page A8-440 v6 Unsigned Extend Halfword UXTH on page A8-524 v6 Unsigned Extend Byte UXTB on page A8-520 v6 Compare and Branch on Zero CBNZ, CBZ on page A8-66 v6T2 Push Multiple Registers PUSH on page A8-248 v4T Compare and Branch on Nonzero CBNZ, CBZ on page A8-66 v6T2 Byte-Reverse Word REV on page A8-272 v6 Byte-Reverse Packed Halfword REV16 on page A8-274 v6 Byte-Reverse Signed Halfword REVSH on page A8-276 v6 Compare and Branch on Nonzero CBNZ, CBZ on page A8-66 v6T2 Pop Multiple Registers POP on page A8-246 v4T Breakpoint BKPT on page A8-56 v5 If-Then, and hints If-Then, and hints on page A6-12 - ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-11 Thumb Instruction Set Encoding If-Then, and hints 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 1 1 1 1 1 1 opA opB Table A6-7 shows the allocation of encodings in this space. Other encodings in this space are unallocated hints. They execute as NOPs, but software must not use them. Table A6-7 Miscellaneous 16-bit instructions opA opB Instruction See Variant - not 0000 If-Then IT on page A8-104 v6T2 0000 0000 No Operation hint NOP on page A8-222 v6T2 0001 0000 Yield hint YIELD on page A8-812 v7 0010 0000 Wait For Event hint WFE on page A8-808 v7 0011 0000 Wait For Interrupt hint WFI on page A8-810 v7 0100 0000 Send Event hint SEV on page A8-316 v7 A6-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.2.6 Conditional branch, and Supervisor Call 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 0 1 Opcode Table A6-8 shows the allocation of encodings in this space. All these instructions are available since the Thumb instruction set was introduced in ARMv4T. Table A6-8 Conditional branch and Supervisor Call instructions Opcode Instruction See not 111x 1110 1111 Conditional branch B on page A8-44 Permanently UNDEFINED. This space will not be allocated in future. Supervisor Call SVC (previously SWI) on page A8-430 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-13 Thumb Instruction Set Encoding A6.3 32-bit Thumb instruction encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 op1 op2 op If op1 == 0b00, a 16-bit instruction is encoded, see 16-bit Thumb instruction encoding on page A6-6. Table A6-9 shows the allocation of encodings in this space. Table A6-9 32-bit Thumb instruction encoding op1 op2 op Instruction class, see 01 00xx0xx - Load/store multiple on page A6-23 00xx1xx - Load/store dual, load/store exclusive, table branch on page A6-24 01xxxxx - Data-processing (shifted register) on page A6-31 1xxxxxx - Coprocessor instructions on page A6-40 10 x0xxxxx 0 Data-processing (modified immediate) on page A6-15 x1xxxxx 0 Data-processing (plain binary immediate) on page A6-19 - 1 Branches and miscellaneous control on page A6-20 11 000xxx0 - Store single data item on page A6-30 001xxx0 - Advanced SIMD element or structure load/store instructions on page A7-27 00xx001 - Load byte, memory hints on page A6-28 00xx011 - Load halfword, memory hints on page A6-26 00xx101 - Load word on page A6-25 00xx111 - UNDEFINED 010xxxx - Data-processing (register) on page A6-33 0110xxx - Multiply, multiply accumulate, and absolute difference on page A6-38 0111xxx - Long multiply, long multiply accumulate, and divide on page A6-39 1xxxxxx - Coprocessor instructions on page A6-40 A6-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.3.1 Data-processing (modified immediate) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 op S Rn 0 Rd Table A6-10 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. In the Rn, Rd and S columns, - indicates that the field value of the field does affect the decoding. These encodings are all available in ARMv6T2 and above. Table A6-10 32-bit modified immediate data-processing instructions op Rn Rd S Instruction See 0000 - not 1111 x Bitwise AND AND (immediate) on page A8-34 - 1111 0 UNPREDICTABLE - - 1111 1 Test TST (immediate) on page A8-454 0001 - - - Bitwise Bit Clear BIC (immediate) on page A8-50 0010 not 1111 - - Bitwise OR ORR (immediate) on page A8-228 1111 - - Move MOV (immediate) on page A8-194 0011 not 1111 - - Bitwise OR NOT ORN (immediate) on page A8-224 1111 - - Bitwise NOT MVN (immediate) on page A8-214 0100 - not 1111 x Bitwise Exclusive OR EOR (immediate) on page A8-94 1111 0 UNPREDICTABLE - 1 Test Equivalence TEQ (immediate) on page A8-448 1000 - not 1111 - Add ADD (immediate, Thumb) on page A8-20 1111 0 UNPREDICTABLE - 1 Compare Negative CMN (immediate) on page A8-74 1010 - - - Add with Carry ADC (immediate) on page A8-14 1011 - - - Subtract with Carry SBC (immediate) on page A8-302 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-15 Thumb Instruction Set Encoding Table A6-10 32-bit modified immediate data-processing instructions (continued) op Rn Rd S Instruction See 1101 1110 - not 1111 - Subtract 1111 0 UNPREDICTABLE 1 Compare - - Reverse Subtract SUB (immediate, Thumb) on page A8-418 CMP (immediate) on page A8-80 RSB (immediate) on page A8-284 These instructions all have modified immediate constants, rather than a simple 12-bit binary number. This provides a more useful range of values. For details see Modified immediate constants in Thumb instructions on page A6-17. A6-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.3.2 Modified immediate constants in Thumb instructions 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 i imm3 abcde fgh Table A6-11 shows the range of modified immediate constants available in Thumb data-processing instructions, and how they are encoded in the a, b, c, d, e, f, g, h, i, and imm3 fields in the instruction. Table A6-11 Encoding of modified immediates in Thumb data-processing instructions i:imm3:a a 0000x 00000000 00000000 00000000 abcdefgh 0001x 00000000 abcdefgh 00000000 abcdefgh b 0010x abcdefgh 00000000 abcdefgh 00000000 b 0011x abcdefgh abcdefgh abcdefgh abcdefgh b 01000 1bcdefgh 00000000 00000000 00000000 01001 01bcdefg h0000000 00000000 00000000 c 01010 001bcdef gh000000 00000000 00000000 01011 0001bcde fgh00000 00000000 00000000 c . . . . 8-bit values shifted to other positions . . 11101 00000000 00000000 000001bc defgh000 c 11110 00000000 00000000 0000001b cdefgh00 11111 00000000 00000000 00000001 bcdefgh0 c a. In this table, the immediate constant value is shown in binary form, to relate abcdefgh to the encoding diagram. In assembly syntax, the immediate value is specified in the usual way (a decimal number by default). b. Not available in ARM instructions. UNPREDICTABLE if abcdefgh == 00000000. c. Not available in ARM instructions if h == 1. Note The range of values available in Thumb modified immediate constants is slightly different from the range of values available in ARM instructions. See Modified immediate constants in ARM instructions on page A5-9 for the ARM values. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-17 Thumb Instruction Set Encoding Carry out A logical instruction with i:imm3:a == ’00xxx’ does not affect the carry flag. Otherwise, a logical instruction that sets the flags sets the Carry flag to the value of bit [31] of the modified immediate constant. Operation // ThumbExpandImm() // ================ bits(32) ThumbExpandImm(bits(12) imm12) // APSR.C argument to following function call does not affect the imm32 result. (imm32, -) = ThumbExpandImm_C(imm12, APSR.C); return imm32; // ThumbExpandImm_C() // ================== (bits(32), bit) ThumbExpandImm_C(bits(12) imm12, bit carry_in) if imm12<11:10> == ‘00’ then case imm12<9:8> of when ‘00’ imm32 = ZeroExtend(imm12<7:0>, 32); when ‘01’ if imm12<7:0> == ‘00000000’ then UNPREDICTABLE; imm32 = ‘00000000’ : imm12<7:0> : ‘00000000’ : imm12<7:0>; when ‘10’ if imm12<7:0> == ‘00000000’ then UNPREDICTABLE; imm32 = imm12<7:0> : ‘00000000’ : imm12<7:0> : ‘00000000’; when ‘11’ if imm12<7:0> == ‘00000000’ then UNPREDICTABLE; imm32 = imm12<7:0> : imm12<7:0> : imm12<7:0> : imm12<7:0>; carry_out = carry_in; else unrotated_value = ZeroExtend(‘1’:imm12<6:0>, 32); (imm32, carry_out) = ROR_C(unrotated_value, UInt(imm12<11:7>)); return (imm32, carry_out); A6-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.3.3 Data-processing (plain binary immediate) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 11110 1 op Rn 0 Table A6-12 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These encodings are all available in ARMv6T2 and above. Table A6-12 32-bit unmodified immediate data-processing instructions op Rn Instruction See 00000 not 1111 Add Wide (12-bit) ADD (immediate, Thumb) on page A8-20 1111 Form PC-relative Address ADR on page A8-32 00100 - Move Wide (16-bit) MOV (immediate) on page A8-194 01010 not 1111 Subtract Wide (12-bit) SUB (immediate, Thumb) on page A8-418 1111 Form PC-relative Address ADR on page A8-32 01100 - Move Top (16-bit) MOVT on page A8-200 100x0 a - Signed Saturate SSAT on page A8-362 10010 b - Signed Saturate (two 16-bit) SSAT16 on page A8-364 10100 - Signed Bit Field Extract SBFX on page A8-308 10110 not 1111 Bit Field Insert BFI on page A8-48 1111 Bit Field Clear BFC on page A8-46 110x0 a - Unsigned Saturate USAT on page A8-504 11010 b - Unsigned Saturate 16 USAT16 on page A8-506 11100 - Unsigned Bit Field Extract UBFX on page A8-466 a. In the second halfword of the instruction, bits [14:12.7:6] != 0b00000. b. In the second halfword of the instruction, bits [14:12.7:6] == 0b00000. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-19 Thumb Instruction Set Encoding A6.3.4 Branches and miscellaneous control 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 11110 op 1 op1 op2 Table A6-13 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A6-13 Branches and miscellaneous control instructions op1 op op2 Instruction See Variant 0x0 not x111xxx - Conditional branch B on page A8-44 0111000 xx00 Move to Special Register, application level MSR (register) on page A8-210 xx01 Move to Special Register, system level xx1x MSR (register) on page B6-14 0111001 - 0111010 - - Change Processor State, and hints on page A6-21 0111011 - - Miscellaneous control instructions on page A6-21 0111100 - Branch and Exchange Jazelle BXJ on page A8-64 0111101 - Exception Return SUBS PC, LR and related instructions on page B6-25 011111x - Move from Special Register MRS on page A8-206 000 1111111 - Secure Monitor Call SMC (previously SMI) on page B6-18 010 1111111 - Permanently UNDEFINED. This space will not be allocated in future. 0x1 - - Branch B on page A8-44 1x0 1x1 - - Branch with Link and Exchange - Branch with Link BL, BLX (immediate) on page A8-58 a. UNDEFINED in ARMv4T. v6T2 All All v6T2 v6T2 v6T2 Security Extensions v6T2 v5T a v4T A6-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding Change Processor State, and hints 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 111100111010 10 0 op1 op2 Table A6-14 shows the allocation of encodings in this space. Other encodings in this space are unallocated hints that execute as NOPs. These unallocated hint encodings are reserved and software must not use them. Table A6-14 Change Processor State, and hint instructions op1 op2 Instruction See Variant not 000 - Change Processor State CPS on page B6-3 v6T2 000 00000000 No Operation hint NOP on page A8-222 v6T2 00000001 Yield hint YIELD on page A8-812 v7 00000010 Wait For Event hint WFE on page A8-808 v7 00000011 Wait For Interrupt hint WFI on page A8-810 v7 00000100 Send Event hint SEV on page A8-316 v7 1111xxxx Debug hint DBG on page A8-88 v7 Miscellaneous control instructions 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 111100111011 10 0 op Table A6-15 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED in ARMv7. They are UNPREDICTABLE in ARMv6. Table A6-15 Miscellaneous control instructions op Instruction See Variant 0000 Leave ThumbEE state a 0001 Enter ThumbEE state 0010 Clear-Exclusive ENTERX, LEAVEX on page A9-7 ThumbEE ENTERX, LEAVEX on page A9-7 ThumbEE CLREX on page A8-70 v7 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-21 Thumb Instruction Set Encoding Table A6-15 Miscellaneous control instructions (continued) op Instruction See Variant 0100 Data Synchronization Barrier DSB on page A8-92 v7 0101 Data Memory Barrier DMB on page A8-90 v7 0110 Instruction Synchronization Barrier ISB on page A8-102 v7 a. This instruction is a NOP in Thumb state. A6-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.3.5 Load/store multiple W 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 0 1 0 0 op 0 L Rn Table A6-16 shows the allocation of encodings in this space. These encodings are all available in ARMv6T2 and above. Table A6-16 Load/store multiple instructions op L Rn Instruction See 00 0 - Store Return State SRS on page B6-20 1- Return From Exception RFE on page B6-16 01 0 - Store Multiple (Increment After, Empty Ascending) STM / STMIA / STMEA on page A8-374 1 not 1101 Load Multiple (Increment After, Full Descending) LDM / LDMIA / LDMFD on page A8-110 1101 Pop Multiple Registers from the stack POP on page A8-246 10 0 not 1101 Store Multiple (Decrement Before, Full Descending) STMDB / STMFD on page A8-378 1101 Push Multiple Registers to the stack. PUSH on page A8-248 1- Load Multiple (Decrement Before, Empty Ascending) LDMDB / LDMEA on page A8-114 11 0 - Store Return State SRS on page B6-20 1- Return From Exception RFE on page B6-16 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-23 Thumb Instruction Set Encoding A6.3.6 Load/store dual, load/store exclusive, table branch 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 0 1 0 0 op1 1 op2 Rn op3 Table A6-17 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A6-17 Load/store double or exclusive, table branch op1 op2 op3 Rn Instruction 00 00 - - Store Register Exclusive 01 - - Load Register Exclusive 0x 10 - - 1x x0 Store Register Dual 0x 11 - not 1111 Load Register Dual (immediate) 1x x1 - not 1111 0x 11 - 1111 Load Register Dual (literal) 1x x1 - 1111 01 00 0100 - Store Register Exclusive Byte 0101 - Store Register Exclusive Halfword 0111 - Store Register Exclusive Doubleword 01 0000 - Table Branch Byte 0001 - Table Branch Halfword 0100 - Load Register Exclusive Byte 0101 - Load Register Exclusive Halfword 0111 - Load Register Exclusive Doubleword See STREX on page A8-400 LDREX on page A8-142 STRD (immediate) on page A8-396 LDRD (immediate) on page A8-136 LDRD (literal) on page A8-138 STREXB on page A8-402 STREXH on page A8-406 STREXD on page A8-404 TBB, TBH on page A8-446 TBB, TBH on page A8-446 LDREXB on page A8-144 LDREXH on page A8-148 LDREXD on page A8-146 Variant v6T2 v6T2 v6T2 v6T2 v6T2 v7 v7 v7 v6T2 v6T2 v7 v7 v7 A6-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.3.7 Load word 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 0 0 op1 1 0 1 Rn op2 Table A6-18 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These encodings are all available in ARMv6T2 and above. Table A6-18 Load word op1 op2 Rn Instruction See 01 - not 1111 Load Register LDR (immediate, Thumb) on page A8-118 00 1xx1xx not 1111 1100xx not 1111 1110xx not 1111 Load Register Unprivileged LDRT on page A8-176 000000 not 1111 Load Register LDR (register) on page A8-124 0x - 1111 Load Register LDR (literal) on page A8-122 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-25 Thumb Instruction Set Encoding A6.3.8 Load halfword, memory hints 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 0 0 op1 0 1 1 Rn Rt op2 Table A6-19 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Except where otherwise noted, these encodings are available in ARMv6T2 and above. Table A6-19 Load halfword, preload op1 op2 Rn Rt Instruction See 0x - 1111 not 1111 Load Register Halfword 01 - not 1111 not 1111 Load Register Halfword 00 1xx1xx not 1111 not 1111 1100xx not 1111 not 1111 1110xx not 1111 not 1111 Load Register Halfword Unprivileged 000000 not 1111 not 1111 Load Register Halfword 1x - 1111 not 1111 Load Register Signed Halfword 11 - not 1111 not 1111 Load Register Signed Halfword 10 1xx1xx not 1111 not 1111 1100xx not 1111 not 1111 1110xx not 1111 not 1111 Load Register Signed Halfword Unprivileged 000000 not 1111 not 1111 Load Register Signed Halfword 0x - 1111 1111 UNPREDICTABLE 01 - not 1111 1111 Preload Data with intent to Write a 00 1100xx not 1111 1111 Preload Data with intent to Write a 000000 not 1111 1111 Preload Data with intent to Write a LDRH (literal) on page A8-154 LDRH (immediate, Thumb) on page A8-150 LDRHT on page A8-158 LDRH (register) on page A8-156 LDRSH (literal) on page A8-170 LDRSH (immediate) on page A8-168 LDRSHT on page A8-174 LDRSH (register) on page A8-172 PLD, PLDW (immediate) on page A8-236 PLD, PLDW (immediate) on page A8-236 PLD, PLDW (register) on page A8-240 A6-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding Table A6-19 Load halfword, preload (continued) op1 op2 Rn Rt Instruction See 00 1xx1xx not 1111 1111 UNPREDICTABLE - 1110xx not 1111 1111 1x - 1111 1111 Unallocated memory hint (treat as NOP) 10 1100xx not 1111 1111 000000 not 1111 1111 10 1xx1xx not 1111 1111 UNPREDICTABLE - 1110xx not 1111 1111 11 - not 1111 1111 Unallocated memory hint (treat as NOP) a. Available in ARMv7 with the Multiprocessing Extensions. In the ARMv7 base architecture and in ARMv6T2 these are unallocated memory hints (treat as NOP). ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-27 Thumb Instruction Set Encoding A6.3.9 Load byte, memory hints 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 0 0 op1 0 0 1 Rn Rt op2 Table A6-20 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These encodings are all available in ARMv6T2 and above. Table A6-20 Load byte, preload op1 op2 Rn Rt Instruction See 0x - 1111 not 1111 Load Register Byte LDRB (literal) on page A8-130 01 - not 1111 not 1111 Load Register Byte 00 1xx1xx not 1111 not 1111 LDRB (immediate, Thumb) on page A8-126 1100xx not 1111 not 1111 1110xx not 1111 not 1111 Load Register Byte Unprivileged LDRBT on page A8-134 000000 not 1111 not 1111 Load Register Byte LDRB (register) on page A8-132 1x - 1111 not 1111 Load Register Signed Byte LDRSB (literal) on page A8-162 11 - not 1111 not 1111 Load Register Signed Byte LDRSB (immediate) on page A8-160 10 1xx1xx not 1111 not 1111 1100xx not 1111 not 1111 1110xx not 1111 not 1111 Load Register Signed Byte LDRSBT on page A8-166 Unprivileged 000000 not 1111 not 1111 Load Register Signed Byte LDRSB (register) on page A8-164 0x - 1111 1111 Preload Data PLD (literal) on page A8-238 01 - not 1111 1111 Preload Data PLD, PLDW (immediate) on page A8-236 00 1100xx not 1111 1111 Preload Data PLD, PLDW (immediate) on page A8-236 000000 not 1111 1111 Preload Data PLD, PLDW (register) on page A8-240 1xx1xx not 1111 1111 UNPREDICTABLE - 1110xx not 1111 1111 A6-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding op1 op2 Rn Rt 1x - 1111 1111 11 - not 1111 1111 10 1100xx not 1111 1111 000000 not 1111 1111 1xx1xx not 1111 1111 1110xx not 1111 1111 Table A6-20 Load byte, preload (continued) Instruction See Preload Instruction PLI (immediate, literal) on page A8-242 Preload Instruction UNPREDICTABLE PLI (register) on page A8-244 - ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-29 Thumb Instruction Set Encoding A6.3.10 Store single data item 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 0 0 0 op1 0 op2 Table A6-21 show the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These encodings are all available in ARMv6T2 and above. Table A6-21 Store single data item op1 op2 Instruction See 100 - Store Register Byte STRB (immediate, Thumb) on page A8-388 000 1xx1xx 1100xx 1110xx Store Register Byte Unprivileged STRBT on page A8-394 0xxxxx Store Register Byte STRB (register) on page A8-392 101 - Store Register Halfword STRH (immediate, Thumb) on page A8-408 001 1xx1xx 1100xx 1110xx Store Register Halfword Unprivileged STRHT on page A8-414 001 0xxxxx Store Register Halfword STRH (register) on page A8-412 110 - Store Register (immediate) STR (immediate, Thumb) on page A8-382 010 1xx1xx 1100xx 1110xx Store Register Unprivileged STRT on page A8-416 0xxxxx Store Register (register) STR (register) on page A8-386 A6-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.3.11 Data-processing (shifted register) imm3 imm2 type 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 0 1 0 1 op S Rn Rd Table A6-22 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These encodings are all available in ARMv6T2 and above. Table A6-22 Data-processing (shifted register) op Rn Rd S Instruction See 0000 - not 1111 x Bitwise AND AND (register) on page A8-36 1111 0 UNPREDICTABLE - 1 Test TST (register) on page A8-456 0001 - - - Bitwise Bit Clear BIC (register) on page A8-52 0010 not 1111 - - Bitwise OR ORR (register) on page A8-230 1111 - - Move MOV (register) on page A8-196 0011 not 1111 - - Bitwise OR NOT ORN (register) on page A8-226 1111 - - Bitwise NOT MVN (register) on page A8-216 0100 - not 1111 - Bitwise Exclusive OR EOR (register) on page A8-96 1111 0 UNPREDICTABLE - 1 Test Equivalence TEQ (register) on page A8-450 0110 - - - Pack Halfword PKH on page A8-234 1000 - not 1111 - Add ADD (register) on page A8-24 1111 0 UNPREDICTABLE - 1 Compare Negative CMN (register) on page A8-76 1010 - - - Add with Carry ADC (register) on page A8-16 1011 - - - Subtract with Carry SBC (register) on page A8-304 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-31 Thumb Instruction Set Encoding op Rn 1101 - 1110 - Table A6-22 Data-processing (shifted register) (continued) Rd S Instruction See not 1111 - Subtract 1111 0 UNPREDICTABLE 1 Compare - - Reverse Subtract SUB (register) on page A8-422 CMP (register) on page A8-82 RSB (register) on page A8-286 A6-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.3.12 Data-processing (register) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 0 1 0 op1 Rn 1 1 1 1 op2 If, in the second halfword of the instruction, bits [15:12] != 0b1111, the instruction is UNDEFINED. Table A6-23 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These encodings are all available in ARMv6T2 and above. Table A6-23 Data-processing (register) op1 op2 Rn Instruction See 000x 0000 - Logical Shift Left LSL (register) on page A8-180 001x 0000 - Logical Shift Right LSR (register) on page A8-184 010x 0000 - Arithmetic Shift Right ASR (register) on page A8-42 011x 0000 - Rotate Right ROR (register) on page A8-280 0000 1xxx not 1111 Signed Extend and Add Halfword SXTAH on page A8-438 1111 Signed Extend Halfword SXTH on page A8-444 0001 1xxx not 1111 Unsigned Extend and Add Halfword UXTAH on page A8-518 1111 Unsigned Extend Halfword UXTH on page A8-524 0010 1xxx not 1111 Signed Extend and Add Byte 16 SXTAB16 on page A8-436 1111 Signed Extend Byte 16 SXTB16 on page A8-442 0011 1xxx not 1111 Unsigned Extend and Add Byte 16 UXTAB16 on page A8-516 1111 Unsigned Extend Byte 16 UXTB16 on page A8-522 0100 1xxx not 1111 Signed Extend and Add Byte SXTAB on page A8-434 1111 Signed Extend Byte SXTB on page A8-440 0101 1xxx not 1111 Unsigned Extend and Add Byte UXTAB on page A8-514 1111 Unsigned Extend Byte UXTB on page A8-520 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-33 Thumb Instruction Set Encoding op1 op2 Rn 1xxx 00xx - 01xx 10xx 10xx - Instruction - Table A6-23 Data-processing (register) (continued) See Parallel addition and subtraction, signed on page A6-35 Parallel addition and subtraction, unsigned on page A6-36 Miscellaneous operations on page A6-37 A6-34 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.3.13 Parallel addition and subtraction, signed 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 0 1 0 1 op1 1111 0 0 op2 If, in the second halfword of the instruction, bits [15:12] != 0b1111, the instruction is UNDEFINED. Table A6-24 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These encodings are all available in ARMv6T2 and above. Table A6-24 Signed parallel addition and subtraction instructions op1 op2 Instruction See 001 00 Add 16-bit SADD16 on page A8-296 010 00 Add, Subtract SASX on page A8-300 110 00 Subtract, Add SSAX on page A8-366 101 00 Subtract 16-bit SSUB16 on page A8-368 000 00 Add 8-bit SADD8 on page A8-298 100 00 Subtract 8-bit SSUB8 on page A8-370 Saturating instructions 001 01 Saturating Add 16-bit QADD16 on page A8-252 010 01 Saturating Add, Subtract QASX on page A8-256 110 01 Saturating Subtract, Add QSAX on page A8-262 101 01 Saturating Subtract 16-bit QSUB16 on page A8-266 000 01 Saturating Add 8-bit QADD8 on page A8-254 100 01 Saturating Subtract 8-bit QSUB8 on page A8-268 Halving instructions 001 10 Halving Add 16-bit SHADD16 on page A8-318 010 10 Halving Add, Subtract SHASX on page A8-322 110 10 Halving Subtract, Add SHSAX on page A8-324 101 10 Halving Subtract 16-bit SHSUB16 on page A8-326 000 10 Halving Add 8-bit SHADD8 on page A8-320 100 10 Halving Subtract 8-bit SHSUB8 on page A8-328 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-35 Thumb Instruction Set Encoding A6.3.14 Parallel addition and subtraction, unsigned 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 0 1 0 1 op1 1111 0 1 op2 If, in the second halfword of the instruction, bits [15:12] != 0b1111, the instruction is UNDEFINED. Table A6-25 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These encodings are all available in ARMv6T2 and above. Table A6-25 Unsigned parallel addition and subtraction instructions op1 op2 Instruction See 001 00 Add 16-bit UADD16 on page A8-460 010 00 Add, Subtract UASX on page A8-464 110 00 Subtract, Add USAX on page A8-508 101 00 Subtract 16-bit USUB16 on page A8-510 000 00 Add 8-bit UADD8 on page A8-462 100 00 Subtract 8-bit USUB8 on page A8-512 Saturating instructions 001 01 Saturating Add 16-bit UQADD16 on page A8-488 010 01 Saturating Add, Subtract UQASX on page A8-492 110 01 Saturating Subtract, Add UQSAX on page A8-494 101 01 Saturating Subtract 16-bit UQSUB16 on page A8-496 000 01 Saturating Add 8-bit UQADD8 on page A8-490 100 01 Saturating Subtract 8-bit UQSUB8 on page A8-498 Halving instructions 001 10 Halving Add 16-bit UHADD16 on page A8-470 010 10 Halving Add, Subtract UHASX on page A8-474 110 10 Halving Subtract, Add UHSAX on page A8-476 101 10 Halving Subtract 16-bit UHSUB16 on page A8-478 000 10 Halving Add 8-bit UHADD8 on page A8-472 100 10 Halving Subtract 8-bit UHSUB8 on page A8-480 A6-36 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.3.15 Miscellaneous operations 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 0 1 0 1 0 op1 1111 1 0 op2 If, in the second halfword of the instruction, bits [15:12] != 0b1111, the instruction is UNDEFINED. Table A6-26 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These encodings are all available in ARMv6T2 and above. Table A6-26 Miscellaneous operations op1 op2 Instruction See 00 00 Saturating Add QADD on page A8-250 01 Saturating Double and Add QDADD on page A8-258 10 Saturating Subtract QSUB on page A8-264 11 Saturating Double and Subtract QDSUB on page A8-260 01 00 Byte-Reverse Word REV on page A8-272 01 Byte-Reverse Packed Halfword REV16 on page A8-274 10 Reverse Bits RBIT on page A8-270 11 Byte-Reverse Signed Halfword REVSH on page A8-276 10 00 Select Bytes SEL on page A8-312 11 00 Count Leading Zeros CLZ on page A8-72 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-37 Thumb Instruction Set Encoding A6.3.16 Multiply, multiply accumulate, and absolute difference 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 0 1 1 0 op1 Ra 0 0 op2 If, in the second halfword of the instruction, bits [7:6] != 0b00, the instruction is UNDEFINED. Table A6-27 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These encodings are all available in ARMv6T2 and above. Table A6-27 Multiply, multiply accumulate, and absolute difference operations op1 op2 Ra Instruction See 000 00 01 001 - 010 0x 011 0x 100 0x 101 0x 110 0x 111 00 not 1111 Multiply Accumulate MLA on page A8-190 1111 Multiply MUL on page A8-212 - Multiply and Subtract MLS on page A8-192 not 1111 Signed Multiply Accumulate (Halfwords) SMLABB, SMLABT, SMLATB, SMLATT on page A8-330 1111 Signed Multiply (Halfwords) SMULBB, SMULBT, SMULTB, SMULTT on page A8-354 not 1111 Signed Multiply Accumulate Dual SMLAD on page A8-332 1111 Signed Dual Multiply Add SMUAD on page A8-352 not 1111 Signed Multiply Accumulate (Word by halfword) SMLAWB, SMLAWT on page A8-340 1111 Signed Multiply (Word by halfword) SMULWB, SMULWT on page A8-358 not 1111 Signed Multiply Subtract Dual SMLSD on page A8-342 1111 Signed Dual Multiply Subtract SMUSD on page A8-360 not 1111 Signed Most Significant Word Multiply Accumulate SMMLA on page A8-346 1111 Signed Most Significant Word Multiply SMMUL on page A8-350 - Signed Most Significant Word Multiply Subtract SMMLS on page A8-348 not 1111 Unsigned Sum of Absolute Differences USAD8 on page A8-500 1111 Unsigned Sum of Absolute Differences, Accumulate USADA8 on page A8-502 A6-38 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding A6.3.17 Long multiply, long multiply accumulate, and divide 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 0 1 1 1 op1 op2 Table A6-28 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A6-28 Multiply, multiply accumulate, and absolute difference operations op1 op2 Instruction See Variant 000 0000 Signed Multiply Long 001 1111 Signed Divide 010 0000 Unsigned Multiply Long 011 1111 Unsigned Divide 100 0000 Signed Multiply Accumulate Long 10xx Signed Multiply Accumulate Long (Halfwords) 110x Signed Multiply Accumulate Long Dual 101 110x Signed Multiply Subtract Long Dual 110 0000 Unsigned Multiply Accumulate Long 0110 Unsigned Multiply Accumulate Accumulate Long a. UNDEFINED in ARMv7-A. SMULL on page A8-356 SDIV on page A8-310 UMULL on page A8-486 UDIV on page A8-468 SMLAL on page A8-334 SMLALBB, SMLALBT, SMLALTB, SMLALTT on page A8-336 SMLALD on page A8-338 SMLSLD on page A8-344 UMLAL on page A8-484 UMAAL on page A8-482 v6T2 v7-R a v6T2 v7-R a v6T2 v6T2 v6T2 v6T2 v6T2 v6T2 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-39 Thumb Instruction Set Encoding A6.3.18 Coprocessor instructions 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 111 11 op1 Rn coproc op Table A6-29 shows the allocation of encodings in this space. These encodings are all available in ARMv6T2 and above. Table A6-29 Coprocessor instructions op1 op coproc Rn Instructions See 000x1x 001xxx 01xxxx 000x10 001xx0 01xxx0 000x11 001xx1 01xxx1 000x11 001xx1 01xxx1 00000x 00010x - 000100 - 000101 - 10xxxx 0 101x - not 101x - Advanced SIMD, VFP Store Coprocessor Extension register load/store instructions on page A7-26 STC, STC2 on page A8-372 not 101x not 1111 Load Coprocessor (immediate) LDC, LDC2 (immediate) on page A8-106 not 101x 1111 Load Coprocessor (literal) LDC, LDC2 (literal) on page A8-108 - - 101x - not 101x - not 101x - 101x - not 101x - UNDEFINED Advanced SIMD, VFP Move to Coprocessor from two ARM core registers Move to two ARM core registers from Coprocessor VFP Coprocessor data operations 64-bit transfers between ARM core and extension registers on page A7-32 MCRR, MCRR2 on page A8-188 MRRC, MRRC2 on page A8-204 VFP data-processing instructions on page A7-24 CDP, CDP2 on page A8-68 A6-40 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Thumb Instruction Set Encoding op1 op coproc Rn 10xxxx 1 101x - 10xxx0 1 not 101x - 10xxx1 1 not 101x - 11xxxx - - - Table A6-29 Coprocessor instructions (continued) Instructions See Advanced SIMD, VFP Move to Coprocessor from ARM core register Move to ARM core register from Coprocessor Advanced SIMD 8, 16, and 32-bit transfer between ARM core and extension registers on page A7-31 MCR, MCR2 on page A8-186 MRC, MRC2 on page A8-202 Advanced SIMD data-processing instructions on page A7-10 For more information about specific coprocessors see Coprocessor support on page A2-68. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A6-41 Thumb Instruction Set Encoding A6-42 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter A7 Advanced SIMD and VFP Instruction Encoding This chapter gives an overview of the Advanced SIMD and VFP instruction sets. It contains the following sections: • Overview on page A7-2 • Advanced SIMD and VFP instruction syntax on page A7-3 • Register encoding on page A7-8 • Advanced SIMD data-processing instructions on page A7-10 • VFP data-processing instructions on page A7-24 • Extension register load/store instructions on page A7-26 • Advanced SIMD element or structure load/store instructions on page A7-27 • 8, 16, and 32-bit transfer between ARM core and extension registers on page A7-31 • 64-bit transfers between ARM core and extension registers on page A7-32. Note • The Advanced SIMD architecture extension, its associated implementations, and supporting software, are commonly referred to as NEON™ technology. • In the decode tables in this chapter, an entry of - for a field value means the value of the field does not affect the decoding. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-1 Advanced SIMD and VFP Instruction Encoding A7.1 Overview All Advanced SIMD and VFP instructions are available in both ARM state and Thumb state. A7.1.1 Advanced SIMD The following sections describe the classes of instruction in the Advanced SIMD extension: • Advanced SIMD data-processing instructions on page A7-10 • Advanced SIMD element or structure load/store instructions on page A7-27 • Extension register load/store instructions on page A7-26 • 8, 16, and 32-bit transfer between ARM core and extension registers on page A7-31 • 64-bit transfers between ARM core and extension registers on page A7-32. A7.1.2 VFP The following sections describe the classes of instruction in the VFP extension: • Extension register load/store instructions on page A7-26 • 8, 16, and 32-bit transfer between ARM core and extension registers on page A7-31 • 64-bit transfers between ARM core and extension registers on page A7-32 • VFP data-processing instructions on page A7-24. A7-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.2 Advanced SIMD and VFP instruction syntax Advanced SIMD and VFP instructions use the general conventions of the ARM instruction set. Advanced SIMD and VFP data-processing instructions use the following general format: V{}{}{.
} {,} , All Advanced SIMD and VFP instructions begin with a V. This distinguishes Advanced SIMD vector and VFP instructions from ARM scalar instructions. The main operation is specified in the field. It is usually a three letter mnemonic the same as or similar to the corresponding scalar integer instruction. The and fields are standard assembler syntax fields. For details see Standard assembler syntax fields on page A8-7. A7.2.1 Advanced SIMD Instruction modifiers The field provides additional variants of some instructions. Table A7-1 provides definitions of the modifiers. Modifiers are not available for every instruction. Table A7-1 Advanced SIMD instruction modifiers Meaning Q The operation uses saturating arithmetic. R The operation performs rounding. D The operation doubles the result (before accumulation, if any). H The operation halves the result. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-3 Advanced SIMD and VFP Instruction Encoding A7.2.2 Advanced SIMD Operand shapes The field provides additional variants of some instructions. Table A7-2 provides definitions of the shapes. Operand shapes are not available for every instruction. Table A7-2 Advanced SIMD operand shapes Meaning Typical register shape (none) L N W The operands and result are all the same width. Long operation - result is twice the width of both operands Narrow operation - result is half the width of both operands Wide operation - result and first operand are twice the width of the second operand Dd, Dn, Dm Qd, Dn, Dm Dd, Qn, Qm Qd, Qn, Dm Qd, Qn, Qm A7.2.3 Data type specifiers The
field normally contains one data type specifier. This indicates the data type contained in • the second operand, if any • the operand, if there is no second operand • the result, if there are no operand registers. The data types of the other operand and result are implied by the
field combined with the instruction shape. For information about data type formats see Data types supported by the Advanced SIMD extension on page A2-25. In the instruction syntax descriptions in Chapter A8 Instruction Details, the
field is usually specified as a single field. However, where more convenient, it is sometimes specified as a concatenation of two fields, . A7-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Syntax flexibility There is some flexibility in the data type specifier syntax: • You can specify three data types, specifying the result and both operand data types. For example: VSUBW.I16.I16.S8 Q3,Q5,D0 instead of: VSUBW.S8 Q3,Q5,D0 • You can specify two data types, specifying the data types of the two operands. The data type of the result is implied by the instruction shape. • You can specify two data types, specifying the data types of the single operand and the result. • Where an instruction requires a less specific data type, you can instead specify a more specific type, as shown in Table A7-3. • Where an instruction does not require a data type, you can provide one. • The F32 data type can be abbreviated to F. • The F64 data type can be abbreviated to D. In all cases, if you provide additional information, the additional information must match the instruction shape. Disassembly does not regenerate this additional information. Table A7-3 Data type specification flexibility Specified data type Permitted more specific data types None .I .8 .16 .32 .64 - .I8 .I16 .I32 .I64 Any .S .U - .S8 .U8 .P8 .S16 .U16 .P16 .S32 .U32 - .S64 .U64 - - .F16 .F32 or .F .F64 or .D ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-5 Advanced SIMD and VFP Instruction Encoding A7.2.4 Register specifiers The , , and fields contain register specifiers, or in some cases scalar specifiers or register lists. Table A7-4 shows the register and scalar specifier formats that appear in the instruction descriptions. If is omitted, it is the same as . Table A7-4 Advanced SIMD and VFP register specifier formats Usual meaning a A quadword destination register for the result vector (Advanced SIMD only). A quadword source register for the first operand vector (Advanced SIMD only). A quadword source register for the second operand vector (Advanced SIMD only).
A doubleword destination register for the result vector. A doubleword source register for the first operand vector. A doubleword source register for the second operand vector. A singleword destination register for the result vector (VFP only). A singleword source register for the first operand vector (VFP only). A singleword source register for the second operand vector (VFP only). A destination scalar for the result. Element x of vector
. (Advanced SIMD only). A source scalar for the first operand. Element x of vector . (Advanced SIMD only). A source scalar for the second operand. Element x of vector . (Advanced SIMD only). An ARM core register. Can be source or destination. An ARM core register. Can be source or destination. a. In some instructions the roles of registers are different. A7-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.2.5 Register lists A register list is a list of register specifiers separated by commas and enclosed in brackets { and }. There are restrictions on what registers can appear in a register list. These restrictions are described in the individual instruction descriptions. Table A7-5 shows some register list formats, with examples of actual register lists corresponding to those formats. Note Register lists must not wrap around the end of the register bank. Syntax flexibility There is some flexibility in the register list syntax: • Where a register list contains consecutive registers, they can be specified as a range, instead of listing every register, for example {D0-D3} instead of {D0,D1,D2,D3}. • Where a register list contains an even number of consecutive doubleword registers starting with an even numbered register, it can be written as a list of quadword registers instead, for example {Q1,Q2} instead of {D2-D5}. • Where a register list contains only one register, the enclosing braces can be omitted, for example VLD1.8 D0,[R0] instead of VLD1.8 {D0},[R0]. Table A7-5 Example register lists Format Example Alternative {
} {D3} D3 {
,,} {D3,D4,D5} {D3-D5} {,} {D7[]} D7[] ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-7 Advanced SIMD and VFP Instruction Encoding A7.3 Register encoding Advanced SIMD registers are either quadword (128 bits wide) or doubleword (64 bits wide). Some instructions have options for either doubleword or quadword registers. This is normally encoded in Q (bit [6]) as Q = 0 for doubleword operations, Q = 1 for quadword operations. VFP registers are either double-precision (64 bits wide) or single-precision (32 bits wide). This is encoded in the sz field (bit [8]) as sz = 1 for double-precision operations, or sz = 0 for single-precision operations. Some instructions use only one or two registers, and use the unused register fields as additional opcode bits. Table A7-6 shows the encodings for the registers. Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 D Vn Vd sz N Q M Vm ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 D Vn Vd sz N Q M Vm Table A7-6 Encoding of register numbers Register mnemonic Usual usage Register number encoded in Notes a Used in Destination (quadword) D, Vd (bits [22,15:13]) bit [12] == 0 Adv. SIMD First operand (quadword) N, Vn (bits [7,19:17]) bit [16] == 0 Adv. SIMD Second operand (quadword) M, Vm (bits [5,3:1]) bit [0] == 0 Adv. SIMD
Destination (doubleword) D, Vd (bits [22,15:12]) - Both First operand (doubleword) N, Vn (bits [7,19:16]) - Both Second operand (doubleword) M, Vm (bits [5,3:0]) - Both Destination (single-precision) Vd, D (bits [15:12,22]) - VFP First operand (single-precision) Vn, N (bits [19:16,7]) - VFP Second operand (single-precision) Vm, M (bits [3:0,5]) - VFP a. If one of these bits is 1, the instruction is UNDEFINED. A7-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.3.1 Advanced SIMD scalars Advanced SIMD scalars can be 8-bit, 16-bit, 32-bit, or 64-bit. Instructions other than multiply instructions can access any element in the register set. The instruction syntax refers to the scalars using an index into a doubleword vector. The descriptions of the individual instructions contain details of the encodings. Table A7-7 shows the form of encoding for scalars used in multiply instructions. These instructions cannot access scalars in some registers. The descriptions of the individual instructions contain cross references to this section where appropriate. 32-bit Advanced SIMD scalars, when used as single-precision floating-point numbers, are equivalent to VFP single-precision registers. That is, Dm[x] in a 32-bit context (0 <= m <= 15, 0 <= x <=1) is equivalent to S[2m + x]. Table A7-7 Encoding of scalars in multiply instructions Scalar mnemonic Usual usage Scalar Register Index Accessible size specifier specifier registers Second operand 16-bit 32-bit Vm[2:0] Vm[3:0] M, Vm[3] D0-D7 M D0-D15 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-9 Advanced SIMD and VFP Instruction Encoding A7.4 Advanced SIMD data-processing instructions Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1U1 1 1 1 A B C ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1U A B C Table A7-8 shows the encoding for Advanced SIMD data-processing instructions. Other encodings in this space are UNDEFINED. In these instructions, the U bit is in a different location in ARM and Thumb instructions. This is bit [12] of the first halfword in the Thumb encoding, and bit [24] in the ARM encoding. Other variable bits are in identical locations in the two encodings, after adjusting for the fact that the ARM encoding is held in memory as a single word and the Thumb encoding is held as two consecutive halfwords. The ARM instructions can only be executed unconditionally. The Thumb instructions can be executed conditionally by using the IT instruction. For details see IT on page A8-104. Table A7-8 Data-processing instructions UA B - 0xxxx 1x000 1x001 1x01x 1x1xx 1xxxx 1x0xx 1x10x 1x0xx 1x10x - C See - Three registers of the same length on page A7-12 0xx1 One register and a modified immediate value on page A7-21 0xx1 Two registers and a shift amount on page A7-17 0xx1 0xx1 1xx1 x0x0 Three registers of different lengths on page A7-15 x0x0 x1x0 Two registers and a scalar on page A7-16 x1x0 A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Table A7-8 Data-processing instructions (continued) UA B C See 0 1x11x - xxx0 Vector Extract, VEXT on page A8-598 1 1x11x 0xxx xxx0 Two registers, miscellaneous on page A7-19 10xx xxx0 Vector Table Lookup, VTBL, VTBX on page A8-798 1100 0xx0 Vector Duplicate, VDUP (scalar) on page A8-592 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-11 Advanced SIMD and VFP Instruction Encoding A7.4.1 Three registers of the same length Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1U1 1 1 1 0 C A B ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1U0 C A B Table A7-9 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A7-9 Three registers of the same length A B U C Instruction See 0000 0 1- 0001 0 10 1 0010 0 1- 0011 0 1- - Vector Halving Add VHADD, VHSUB on page A8-600 - Vector Saturating Add VQADD on page A8-700 - Vector Rounding Halving Add VRHADD on page A8-734 00 Vector Bitwise AND VAND (register) on page A8-544 01 Vector Bitwise Bit Clear (AND complement) VBIC (register) on page A8-548 10 Vector Bitwise OR (if source registers differ) VORR (register) on page A8-680 Vector Move (if source registers identical) VMOV (register) on page A8-642 11 Vector Bitwise OR NOT VORN (register) on page A8-676 00 Vector Bitwise Exclusive OR VEOR on page A8-596 01 Vector Bitwise Select VBIF, VBIT, VBSL on page A8-550 10 Vector Bitwise Insert if True VBIF, VBIT, VBSL on page A8-550 11 Vector Bitwise Insert if False VBIF, VBIT, VBSL on page A8-550 - Vector Halving Subtract VHADD, VHSUB on page A8-600 - Vector Saturating Subtract VQSUB on page A8-724 - Vector Compare Greater Than VCGT (register) on page A8-560 - Vector Compare Greater Than or Equal VCGE (register) on page A8-556 A7-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Table A7-9 Three registers of the same length (continued) A B U C Instruction See 0100 0 - - Vector Shift Left VSHL (register) on page A8-752 1 - - Vector Saturating Shift Left VQSHL (register) on page A8-718 0101 0 - - Vector Rounding Shift Left VRSHL on page A8-736 1 - - Vector Saturating Rounding Shift Left VQRSHL on page A8-714 0110 - - - Vector Maximum or Minimum VMAX, VMIN (integer) on page A8-630 0111 0 - - Vector Absolute Difference VABD, VABDL (integer) on page A8-528 1 - - Vector Absolute Difference and Accumulate VABA, VABAL on page A8-526 1000 0 0 - Vector Add VADD (integer) on page A8-536 1 - Vector Subtract VSUB (integer) on page A8-788 1 0 - Vector Test Bits VTST on page A8-802 1 - Vector Compare Equal VCEQ (register) on page A8-552 1001 0 - - Vector Multiply Accumulate or Subtract VMLA, VMLAL, VMLS, VMLSL (integer) on page A8-634 1 - - Vector Multiply VMUL, VMULL (integer and polynomial) on page A8-662 1010 - - - Vector Pairwise Maximum or Minimum VPMAX, VPMIN (integer) on page A8-690 1011 0 0 - Vector Saturating Doubling Multiply Returning High Half VQDMULH on page A8-704 1 - Vector Saturating Rounding Doubling Multiply Returning High Half VQRDMULH on page A8-712 1 0 - Vector Pairwise Add VPADD (integer) on page A8-684 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-13 Advanced SIMD and VFP Instruction Encoding Table A7-9 Three registers of the same length (continued) A B U C Instruction See 1101 0 0 1 10 0x Vector Add 1x Vector Subtract 0x Vector Pairwise Add 1x Vector Absolute Difference - Vector Multiply Accumulate or Subtract 1 1110 0 0 1 11 1111 0 0 0x Vector Multiply 0x Vector Compare Equal 0x Vector Compare Greater Than or Equal 1x Vector Compare Greater Than - Vector Absolute Compare Greater or Less Than (or Equal) - Vector Maximum or Minimum 1 - Vector Pairwise Maximum or Minimum 10 0 0x Vector Reciprocal Step 1x Vector Reciprocal Square Root Step VADD (floating-point) on page A8-538 VSUB (floating-point) on page A8-790 VPADD (floating-point) on page A8-686 VABD (floating-point) on page A8-530 VMLA, VMLS (floating-point) on page A8-636 VMUL (floating-point) on page A8-664 VCEQ (register) on page A8-552 VCGE (register) on page A8-556 VCGT (register) on page A8-560 VACGE, VACGT, VACLE,VACLT on page A8-534 VMAX, VMIN (floating-point) on page A8-632 VPMAX, VPMIN (floating-point) on page A8-692 VRECPS on page A8-730 VRSQRTS on page A8-744 A7-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.4.2 Three registers of different lengths Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1U1 1 1 1 1 B A 00 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1U1 B A 00 If B == 0b11, see Advanced SIMD data-processing instructions on page A7-10. Table A7-10 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A7-10 Data-processing instructions with three registers of different lengths A U Instruction See 000x 001x 0100 0 1 0101 0110 0 1 0111 - 10x0 - 10x1 0 1100 - 1101 0 1110 - Vector Add Long or Wide VADDL, VADDW on page A8-542 Vector Subtract Long or Wide VSUBL, VSUBW on page A8-794 Vector Add and Narrow, returning High Half VADDHN on page A8-540 Vector Rounding Add and Narrow, returning High Half VRADDHN on page A8-726 Vector Absolute Difference and Accumulate VABA, VABAL on page A8-526 Vector Subtract and Narrow, returning High Half VSUBHN on page A8-792 Vector Rounding Subtract and Narrow, returning High Half VRSUBHN on page A8-748 Vector Absolute Difference VABD, VABDL (integer) on page A8-528 Vector Multiply Accumulate or Subtract VMLA, VMLAL, VMLS, VMLSL (integer) on page A8-634 Vector Saturating Doubling Multiply Accumulate or Subtract Long VQDMLAL, VQDMLSL on page A8-702 Vector Multiply (integer) VMUL, VMULL (integer and polynomial) on page A8-662 Vector Saturating Doubling Multiply Long VQDMULL on page A8-706 Vector Multiply (polynomial) VMUL, VMULL (integer and polynomial) on page A8-662 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-15 Advanced SIMD and VFP Instruction Encoding A7.4.3 Two registers and a scalar Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1U1 1 1 1 1 B A 10 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1U1 B A 10 If B == 0b11, see Advanced SIMD data-processing instructions on page A7-10. Table A7-11 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A7-11 Data-processing instructions with two registers and a scalar A U Instruction See 0x0x - 0x10 - 0x11 0 100x 1010 1011 0 1100 - 1101 - Vector Multiply Accumulate or Subtract VMLA, VMLAL, VMLS, VMLSL (by scalar) on page A8-638 Vector Multiply Accumulate or Subtract Long VMLA, VMLAL, VMLS, VMLSL (by scalar) on page A8-638 Vector Saturating Doubling Multiply Accumulate or Subtract Long VQDMLAL, VQDMLSL on page A8-702 Vector Multiply VMUL, VMULL (by scalar) on page A8-666 Vector Multiply Long VMUL, VMULL (by scalar) on page A8-666 Vector Saturating Doubling Multiply Long VQDMULL on page A8-706 Vector Saturating Doubling Multiply returning VQDMULH on page A8-704 High Half Vector Saturating Rounding Doubling Multiply returning High Half VQRDMULH on page A8-712 A7-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.4.4 Two registers and a shift amount Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 U 1 1 1 1 1 imm3 A LB 1 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1 U 1 imm3 A LB 1 If [L, imm3] == 0b0000, see One register and a modified immediate value on page A7-21. Table A7-12 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A7-12 Data-processing instructions with two registers and a shift amount A U B L Instruction See 0000 - - - Vector Shift Right VSHR on page A8-756 0001 - - - Vector Shift Right and Accumulate VSRA on page A8-764 0010 - - - Vector Rounding Shift Right VRSHR on page A8-738 0011 - - - Vector Rounding Shift Right and Accumulate VRSRA on page A8-746 0100 1 - - Vector Shift Right and Insert VSRI on page A8-766 0101 0 - - Vector Shift Left VSHL (immediate) on page A8-750 0101 1 - - Vector Shift Left and Insert VSLI on page A8-760 011x - - - Vector Saturating Shift Left VQSHL, VQSHLU (immediate) on page A8-720 1000 0 0 0 Vector Shift Right Narrow VSHRN on page A8-758 1 - Vector Rounding Shift Right Narrow VRSHRN on page A8-740 1 0 - Vector Saturating Shift Right, Unsigned Narrow VQSHRN, VQSHRUN on page A8-722 1 - Vector Saturating Shift Right, Rounded Unsigned Narrow VQRSHRN, VQRSHRUN on page A8-716 1001 - 0 - Vector Saturating Shift Right, Narrow VQSHRN, VQSHRUN on page A8-722 1 - Vector Saturating Shift Right, Rounded Narrow VQRSHRN, VQRSHRUN on page A8-716 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-17 Advanced SIMD and VFP Instruction Encoding Table A7-12 Data-processing instructions with two registers and a shift amount (continued) A U B L Instruction See 1010 - 0 - Vector Shift Left Long Vector Move Long 111x - - - Vector Convert VSHLL on page A8-754 VMOVL on page A8-654 VCVT (between floating-point and fixed-point, Advanced SIMD) on page A8-580 A7-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.4.5 Two registers, miscellaneous Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 111111111 11 A 0 B 0 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 111100111 11 A 0 B 0 The allocation of encodings in this space is shown in Table A7-13. Other encodings in this space are UNDEFINED. Table A7-13 Instructions with two registers, miscellaneous AB Instruction 00 0000x Vector Reverse in doublewords 0001x Vector Reverse in words 0010x Vector Reverse in halfwords 010xx Vector Pairwise Add Long 1000x Vector Count Leading Sign Bits 1001x Vector Count Leading Zeros 1010x Vector Count 1011x Vector Bitwise NOT 110xx Vector Pairwise Add and Accumulate Long 1110x Vector Saturating Absolute 1111x Vector Saturating Negate See VREV16, VREV32, VREV64 on page A8-732 VREV16, VREV32, VREV64 on page A8-732 VREV16, VREV32, VREV64 on page A8-732 VPADDL on page A8-688 VCLS on page A8-566 VCLZ on page A8-570 VCNT on page A8-574 VMVN (register) on page A8-670 VPADAL on page A8-682 VQABS on page A8-698 VQNEG on page A8-710 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-19 Advanced SIMD and VFP Instruction Encoding Table A7-13 Instructions with two registers, miscellaneous (continued) AB Instruction See 01 x000x Vector Compare Greater Than Zero VCGT (immediate #0) on page A8-562 x001x Vector Compare Greater Than or Equal to Zero VCGE (immediate #0) on page A8-558 x010x Vector Compare Equal to zero VCEQ (immediate #0) on page A8-554 x011x Vector Compare Less Than or Equal to Zero VCLE (immediate #0) on page A8-564 x100x Vector Compare Less Than Zero VCLT (immediate #0) on page A8-568 x110x Vector Absolute VABS on page A8-532 x111x Vector Negate 10 0000x Vector Swap VNEG on page A8-672 VSWP on page A8-796 0001x Vector Transpose VTRN on page A8-800 0010x Vector Unzip VUZP on page A8-804 0011x Vector Zip VZIP on page A8-806 10 01000 Vector Move and Narrow VMOVN on page A8-656 01001 Vector Saturating Move and Unsigned Narrow VQMOVN, VQMOVUN on page A8-708 0101x Vector Saturating Move and Narrow VQMOVN, VQMOVUN on page A8-708 01100 Vector Shift Left Long (maximum shift) VSHLL on page A8-754 11x00 Vector Convert VCVT (between half-precision and single-precision, Advanced SIMD) on page A8-586 11 10x0x Vector Reciprocal Estimate VRECPE on page A8-728 10x1x Vector Reciprocal Square Root Estimate VRSQRTE on page A8-742 11xxx Vector Convert VCVT (between floating-point and integer, Advanced SIMD) on page A8-576 A7-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.4.6 One register and a modified immediate value Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 111a11111 000bcd cmode 0 op 1 e f g h ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1111001a1 000bcd cmode 0 op 1 e f g h Table A7-14 shows the allocation of encodings in this space. Table A7-15 on page A7-22 shows the modified immediate constants available with these instructions, and how they are encoded. Table A7-14 Data-processing instructions with one register and a modified immediate value op cmode Instruction See 0 0xx0 0xx1 10x0 10x1 11xx 1 0xx0 0xx1 10x0 10x1 110x 1110 1111 Vector Move VMOV (immediate) on page A8-640 Vector Bitwise OR VORR (immediate) on page A8-678 Vector Move VMOV (immediate) on page A8-640 Vector Bitwise OR VORR (immediate) on page A8-678 Vector Move VMOV (immediate) on page A8-640 Vector Bitwise NOT VMVN (immediate) on page A8-668 Vector Bit Clear VBIC (immediate) on page A8-546 Vector Bitwise NOT VMVN (immediate) on page A8-668 Vector Bit Clear VBIC (immediate) on page A8-546 Vector Bitwise NOT VMVN (immediate) on page A8-668 Vector Move VMOV (immediate) on page A8-640 UNDEFINED - ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-21 Advanced SIMD and VFP Instruction Encoding Table A7-15 Modified immediate values for Advanced SIMD instructions op cmode Constant a
b Notes - 000x 00000000 00000000 00000000 abcdefgh 00000000 00000000 00000000 abcdefgh I32 c 001x 00000000 00000000 abcdefgh 00000000 00000000 00000000 abcdefgh 00000000 I32 c, d 010x 00000000 abcdefgh 00000000 00000000 00000000 abcdefgh 00000000 00000000 I32 c, d 011x abcdefgh 00000000 00000000 00000000 abcdefgh 00000000 00000000 00000000 I32 c, d 100x 00000000 abcdefgh 00000000 abcdefgh 00000000 abcdefgh 00000000 abcdefgh I16 c 101x abcdefgh 00000000 abcdefgh 00000000 abcdefgh 00000000 abcdefgh 00000000 I16 c, d 1100 00000000 00000000 abcdefgh 11111111 00000000 00000000 abcdefgh 11111111 I32 d, e 1101 00000000 abcdefgh 11111111 11111111 00000000 abcdefgh 11111111 11111111 I32 d, e 0 1110 abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh I8 f 1 1110 aaaaaaaa bbbbbbbb cccccccc dddddddd eeeeeeee ffffffff gggggggg hhhhhhhh I64 f 0 1111 aBbbbbbc defgh000 00000000 00000000 aBbbbbbc defgh000 00000000 00000000 F32 f, g 1 1111 UNDEFINED - - a. In this table, the immediate value is shown in binary form, to relate abcdefgh to the encoding diagram. In assembler syntax, the constant is specified by a data type and a value of that type. That value is specified in the normal way (a decimal number by default) and is replicated enough times to fill the 64-bit immediate. For example, a data type of I32 and a value of 10 specify the 64-bit constant 0x0000000A0000000A. b. This specifies the data type used when the instruction is disassembled. On assembly, the data type must be matched in the table if possible. Other data types are permitted as pseudo-instructions when code is assembled, provided the 64-bit constant specified by the data type and value is available for the instruction (if it is available in more than one way, the first entry in this table that can produce it is used). For example, VMOV.I64 D0,#0x8000000080000000 does not specify a 64-bit constant that is available from the I64 line of the table, but does specify one that is available from the fourth I32 line or the F32 line. It is assembled to the former, and therefore is disassembled as VMOV.I32 D0,#0x80000000. c. This constant is available for the VBIC, VMOV, VMVN, and VORR instructions. d. UNPREDICTABLE if abcdefgh == 00000000. e. This constant is available for the VMOV and VMVN instructions only. f. This constant is available for the VMOV instruction only. g. In this entry, B = NOT(b). The bit pattern represents the floating-point number (–1)S * 2exp * mantissa, where S = UInt(a), exp = UInt(NOT(b):c:d)-3 and mantissa = (16+UInt(e:f:g:h))/16. A7-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Operation // AdvSIMDExpandImm() // ================== bits(64) AdvSIMDExpandImm(bit op, bits(4) cmode, bits(8) imm8) case cmode<3:1> of when ‘000’ testimm8 = FALSE; imm64 = Replicate(Zeros(24):imm8, 2); when ‘001’ testimm8 = TRUE; imm64 = Replicate(Zeros(16):imm8:Zeros(8), 2); when ‘010’ testimm8 = TRUE; imm64 = Replicate(Zeros(8):imm8:Zeros(16), 2); when ‘011’ testimm8 = TRUE; imm64 = Replicate(imm8:Zeros(24), 2); when ‘100’ testimm8 = FALSE; imm64 = Replicate(Zeros(8):imm8, 4); when ‘101’ testimm8 = TRUE; imm64 = Replicate(imm8:Zeros(8), 4); when ‘110’ testimm8 = TRUE; if cmode<0> == ‘0’ then imm64 = Replicate(Zeros(16):imm8:Ones(8), 2); else imm64 = Replicate(Zeros(8):imm8:Ones(16), 2); when ‘111’ testimm8 = FALSE; if cmode<0> == ‘0’ && op == ‘0’ then imm64 = Replicate(imm8, 8); if cmode<0> == ‘0’ && op == ‘1’ then imm8a = Replicate(imm8<7>, 8); imm8b = Replicate(imm8<6>, 8); imm8c = Replicate(imm8<5>, 8); imm8d = Replicate(imm8<4>, 8); imm8e = Replicate(imm8<3>, 8); imm8f = Replicate(imm8<2>, 8); imm8g = Replicate(imm8<1>, 8); imm8h = Replicate(imm8<0>, 8); imm64 = imm8a:imm8b:imm8c:imm8d:imm8e:imm8f:imm8g:imm8h; if cmode<0> == ‘1’ && op == ‘0’ then imm32 = imm8<7>:NOT(imm8<6>):Replicate(imm8<6>,5):imm8<5:0>:Zeros(19); imm64 = Replicate(imm32, 2); if cmode<0> == ‘1’ && op == ‘1’ then UNDEFINED; if testimm8 && imm8 == ‘00000000’ then UNPREDICTABLE; return imm64; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-23 Advanced SIMD and VFP Instruction Encoding A7.5 VFP data-processing instructions Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 T 1 1 1 0 opc1 opc2 1 0 1 opc3 0 opc4 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 1 1 1 0 opc1 opc2 1 0 1 opc3 0 opc4 If T == 1 in the Thumb encoding or cond == 0b1111 in the ARM encoding, the instruction is UNDEFINED. Otherwise: • Table A7-16 shows the encodings for three-register VFP data-processing instructions. Other encodings in this space are UNDEFINED. • Table A7-17 on page A7-25 applies only if Table A7-16 indicates that it does. It shows the encodings for VFP data-processing instructions with two registers or a register and an immediate. Other encodings in this space are UNDEFINED. • Table A7-18 on page A7-25 shows the immediate constants available in the VMOV (immediate) instruction. These instructions are CDP instructions for coprocessors 10 and 11. Table A7-16 Three-register VFP data-processing instructions opc1 opc3 Instruction See 0x00 - 0x01 0x10 x1 x0 0x11 x0 x1 1x00 x0 1x11 - Vector Multiply Accumulate or Subtract VMLA, VMLS (floating-point) on page A8-636 Vector Negate Multiply Accumulate or Subtract VNMLA, VNMLS, VNMUL on page A8-674 Vector Multiply Vector Add Vector Subtract Vector Divide Other VFP data-processing instructions VMUL (floating-point) on page A8-664 VADD (integer) on page A8-536 VSUB (integer) on page A8-788 VDIV on page A8-590 Table A7-17 on page A7-25 A7-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Table A7-17 Other VFP data-processing instructions opc2 opc3 Instruction See - x0 0000 01 11 0001 01 11 001x x1 010x x1 0111 11 1000 x1 101x x1 110x x1 111x x1 Vector Move VMOV (immediate) on page A8-640 Vector Move VMOV (register) on page A8-642 Vector Absolute VABS on page A8-532 Vector Negate VNEG on page A8-672 Vector Square Root VSQRT on page A8-762 Vector Convert VCVTB, VCVTT (between half-precision and single-precision, VFP) on page A8-588 Vector Compare VCMP, VCMPE on page A8-572 Vector Convert VCVT (between double-precision and single-precision) on page A8-584 Vector Convert VCVT, VCVTR (between floating-point and integer, VFP) on page A8-578 Vector Convert VCVT (between floating-point and fixed-point, VFP) on page A8-582 Vector Convert VCVT, VCVTR (between floating-point and integer, VFP) on page A8-578 Vector Convert VCVT (between floating-point and fixed-point, VFP) on page A8-582 Table A7-18 VFP modified immediate constants Data type opc2 opc4 Constant a F32 abcd efgh aBbbbbbc defgh000 00000000 00000000 F64 abcd efgh aBbbbbbb bbcdefgh 00000000 00000000 00000000 00000000 00000000 00000000 a. In this column, B = NOT(b). The bit pattern represents the floating-point number (–1)S * 2exp * mantissa, where S = UInt(a), exp = UInt(NOT(b):c:d)-3 and mantissa = (16+UInt(e:f:g:h))/16. A7.5.1 Operation // VFPExpandImm() // ============== bits(N) VFPExpandImm(bits(8) imm8, integer N) assert N == 32 || N == 64; if N == 32 then return imm8<7>:NOT(imm8<6>):Replicate(imm8<6>,5):imm8<5:0>:Zeros(19); else return imm8<7>:NOT(imm8<6>):Replicate(imm8<6>,8):imm8<5:0>:Zeros(48); ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-25 Advanced SIMD and VFP Instruction Encoding A7.6 Extension register load/store instructions Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 T 1 1 0 Opcode Rn 101 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 1 1 0 Opcode Rn 101 If T == 1 in the Thumb encoding or cond == 0b1111 in the ARM encoding, the instruction is UNDEFINED. Otherwise, the allocation of encodings in this space is shown in Table A7-19. Other encodings in this space are UNDEFINED. These instructions are LDC and STC instructions for coprocessors 10 and 11. Table A7-19 Extension register load/store instructions Opcode Rn Instruction See 0010x 01x00 01x10 1xx00 10x10 01x01 01x11 1xx01 10x11 - - 64-bit transfers between ARM core and extension registers on page A7-32 - Vector Store Multiple (Increment After, no writeback) VSTM on page A8-784 - Vector Store Multiple (Increment After, writeback) VSTM on page A8-784 - Vector Store Register VSTR on page A8-786 not 1101 Vector Store Multiple (Decrement Before, writeback) VSTM on page A8-784 1101 Vector Push Registers VPUSH on page A8-696 - Vector Load Multiple (Increment After, no writeback) VLDM on page A8-626 not 1101 Vector Load Multiple (Increment After, writeback) VLDM on page A8-626 1101 Vector Pop Registers VPOP on page A8-694 - Vector Load Register VLDR on page A8-628 - Vector Load Multiple (Decrement Before, writeback) VLDM on page A8-626 A7-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.7 Advanced SIMD element or structure load/store instructions Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 0 0 1A L0 B ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 1 0 0A L0 B The allocation of encodings in this space is shown in: • Table A7-20 if L == 0, store instructions • Table A7-21 on page A7-28 if L == 1, load instructions. Other encodings in this space are UNDEFINED. The variable bits are in identical locations in the two encodings, after adjusting for the fact that the ARM encoding is held in memory as a single word and the Thumb encoding is held as two consecutive halfwords. The ARM instructions can only executed unconditionally. The Thumb instructions can be executed conditionally by using the IT instruction. For details see IT on page A8-104. Table A7-20 Element and structure store instructions (L == 0) AB Instruction See 0 0010 Vector Store VST1 (multiple single elements) on page A8-768 011x 1010 0011 Vector Store VST2 (multiple 2-element structures) on page A8-772 100x 010x Vector Store VST3 (multiple 3-element structures) on page A8-776 000x Vector Store VST4 (multiple 4-element structures) on page A8-780 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-27 Advanced SIMD and VFP Instruction Encoding Table A7-20 Element and structure store instructions (L == 0) (continued) AB Instruction See 1 0x00 Vector Store VST1 (single element from one lane) on page A8-770 1000 0x01 Vector Store VST2 (single 2-element structure from one lane) on page A8-774 1001 0x10 Vector Store VST3 (single 3-element structure from one lane) on page A8-778 1010 0x11 Vector Store VST4 (single 4-element structure from one lane) on page A8-782 1011 Table A7-21 Element and structure load instructions (L == 1) AB Instruction See 0 0010 Vector Load VLD1 (multiple single elements) on page A8-602 011x 1010 0011 Vector Load VLD2 (multiple 2-element structures) on page A8-608 100x 010x Vector Load VLD3 (multiple 3-element structures) on page A8-614 000x Vector Load VLD4 (multiple 4-element structures) on page A8-620 A7-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Table A7-21 Element and structure load instructions (L == 1) (continued) AB Instruction See 1 0x00 Vector Load VLD1 (single element to one lane) on page A8-604 1000 1100 Vector Load VLD1 (single element to all lanes) on page A8-606 0x01 Vector Load VLD2 (single 2-element structure to one lane) on page A8-610 1001 1101 Vector Load VLD2 (single 2-element structure to all lanes) on page A8-612 0x10 Vector Load VLD3 (single 3-element structure to one lane) on page A8-616 1010 1110 Vector Load VLD3 (single 3-element structure to all lanes) on page A8-618 0x11 Vector Load VLD4 (single 4-element structure to one lane) on page A8-622 1011 1111 Vector Load VLD4 (single 4-element structure to all lanes) on page A8-624 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-29 Advanced SIMD and VFP Instruction Encoding A7.7.1 Advanced SIMD addressing mode All the element and structure load/store instructions use this addressing mode. There is a choice of three formats: [{@}] The address is contained in ARM core register Rn. Rn is not updated by this instruction. Encoded as Rm = 0b1111. If Rn is encoded as 0b1111, the instruction is UNPREDICTABLE. [{@}]! The address is contained in ARM core register Rn. Rn is updated by this instruction: Rn = Rn + transfer_size Encoded as Rm = 0b1101. transfer_size is the number of bytes transferred by the instruction. This means that, after the instruction is executed, Rn points to the address in memory immediately following the last address loaded from or stored to. If Rn is encoded as 0b1111, the instruction is UNPREDICTABLE. This addressing mode can also be written as: [{@align}], # However, disassembly produces the [{@align}]! form. [{@}], The address is contained in ARM core register . Rn is updated by this instruction: Rn = Rn + Rm Encoded as Rm = Rm. Rm must not be encoded as 0b1111 or 0b1101 (the PC or the SP). If Rn is encoded as 0b1111, the instruction is UNPREDICTABLE. In all cases, specifies an optional alignment. Details are given in the individual instruction descriptions. A7-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.8 8, 16, and 32-bit transfer between ARM core and extension registers Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 111T1110 A L 1 0 1C B1 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 1 1 1 0 A L 1 0 1C B1 If T == 1 in the Thumb encoding or cond == 0b1111 in the ARM encoding, the instruction is UNDEFINED. Otherwise, the allocation of encodings in this space is shown in Table A7-22. Other encodings in this space are UNDEFINED. These instructions are MRC and MCR instructions for coprocessors 10 and 11. Table A7-22 8-bit, 16-bit and 32-bit data transfer instructions L C A B Instruction See 0 0 000 - Vector Move VMOV (between ARM core register and single-precision register) on page A8-648 111 - Move to VFP Special Register from ARM core register VMSR on page A8-660 VMSR on page B6-29 (System level view) 0 1 0xx - Vector Move VMOV (ARM core register to scalar) on page A8-644 1xx 0x Vector Duplicate VDUP (ARM core register) on page A8-594 1 0 000 - Vector Move VMOV (between ARM core register and single-precision register) on page A8-648 111 - Move to ARM core register from VFP VMRS on page A8-658 Special Register VMRS on page B6-27 (System level view) 1 xxx - Vector Move VMOV (scalar to ARM core register) on page A8-646 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-31 Advanced SIMD and VFP Instruction Encoding A7.9 64-bit transfers between ARM core and extension registers Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 111T1100010 1 0 1 C op ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 1 1 0 0 0 1 0 1 0 1 C op If T == 1 in the Thumb encoding or cond == 0b1111 in the ARM encoding, the instruction is UNDEFINED. Otherwise, the allocation of encodings in this space is shown in Table A7-23. Other encodings in this space are UNDEFINED. These instructions are MRRC and MCRR instructions for coprocessors 10 and 11. Table A7-23 8-bit, 16-bit and 32-bit data transfer instructions C op Instruction 0 00x1 VMOV (between two ARM core registers and two single-precision registers) on page A8-650 1 00x1 VMOV (between two ARM core registers and a doubleword extension register) on page A8-652 A7-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter A8 Instruction Details This chapter describes each instruction. It contains the following sections: • Format of instruction descriptions on page A8-2 • Standard assembler syntax fields on page A8-7 • Conditional execution on page A8-8 • Shifts applied to a register on page A8-10 • Memory accesses on page A8-13 • Alphabetical list of instructions on page A8-14. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-1 Instruction Details A8.1 Format of instruction descriptions The instruction descriptions in Alphabetical list of instructions on page A8-14 normally use the following format: • instruction section title • introduction to the instruction • instruction encoding(s) with architecture information • assembler syntax • pseudocode describing how the instruction operates • exception information • notes (where applicable). Each of these items is described in more detail in the following subsections. A few instruction descriptions describe alternative mnemonics for other instructions and use an abbreviated and modified version of this format. A8.1.1 Instruction section title The instruction section title gives the base mnemonic for the instructions described in the section. When one mnemonic has multiple forms described in separate instruction sections, this is followed by a short description of the form in parentheses. The most common use of this is to distinguish between forms of an instruction in which one of the operands is an immediate value and forms in which it is a register. Parenthesized text is also used to document the former mnemonic in some cases where a mnemonic has been replaced entirely by another mnemonic in the new assembler syntax. A8.1.2 Introduction to the instruction The instruction section title is followed by text that briefly describes the main features of the instruction. This description is not necessarily complete and is not definitive. If there is any conflict between it and the more detailed information that follows, the latter takes priority. A8.1.3 Instruction encodings This is a list of one or more instruction encodings. Each instruction encoding is labelled as: • T1, T2, T3 … for the first, second, third and any additional Thumb encodings • A1, A2, A3 … for the first, second, third and any additional ARM encodings • E1, E2, E3 … for the first, second, third and any additional ThumbEE encodings that are not also Thumb encodings. Where Thumb and ARM encodings are very closely related, the two encodings are described together, for example as encoding T1 / A1. A8-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Each instruction encoding description consists of: • Information about which architecture variants include the particular encoding of the instruction. This is presented in one of two ways: — For instruction encodings that are in the main instruction set architecture, as a list of the architecture variants that include the encoding. See Architecture versions, profiles, and variants on page A1-4 for a summary of these variants. — For instruction encodings that are in the architecture extensions, as a list of the architecture extensions that include the encoding. See Architecture extensions on page A1-6 for a summary of the architecture extensions and the architecture variants that they can extend. In architecture variant lists: — ARMv7 means ARMv7-A and ARMv7-R profiles. The architecture variant information in this manual does not cover the ARMv7-M profile. — * is used as a wildcard. For example, ARMv5T* means ARMv5T, ARMv5TE, and ARMv5TEJ. • An assembly syntax that ensures that the assembler selects the encoding in preference to any other encoding. In some cases, multiple syntaxes are given. The correct one to use is sometimes indicated by annotations to the syntax, such as Inside IT block and Outside IT block. In other cases, the correct one to use can be determined by looking at the assembler syntax description and using it to determine which syntax corresponds to the instruction being disassembled. There is usually more than one syntax that ensures re-assembly to any particular encoding, and the exact set of syntaxes that do so usually depends on the register numbers, immediate constants and other operands to the instruction. For example, when assembling to the Thumb instruction set, the syntax AND R0,R0,R8 ensures selection of a 32-bit encoding but AND R0,R0,R1 selects a 16-bit encoding. The assembly syntax documented for the encoding is chosen to be the simplest one that ensures selection of that encoding for all operand combinations supported by that encoding. This often means that it includes elements that are only necessary for a small subset of operand combinations. For example, the assembler syntax documented for the 32-bit Thumb AND (register) encoding includes the .W qualifier to ensure that the 32-bit encoding is selected even for the small proportion of operand combinations for which the 16-bit encoding is also available. The assembly syntax given for an encoding is therefore a suitable one for a disassembler to disassemble that encoding to. However, disassemblers might wish to use simpler syntaxes when they are suitable for the operand combination, in order to produce more readable disassembled code. • An encoding diagram, or a Thumb encoding diagram followed by an ARM encoding diagram when they are being described together. This is half-width for 16-bit Thumb encodings and full-width for 32-bit Thumb and ARM encodings. The 32-bit Thumb encodings use a double vertical line between the two halfwords of the instruction to distinguish them from ARM encodings and to act as a reminder that 32-bit Thumb instructions consist of two consecutive halfwords rather than a word. In particular, if instructions are stored using the standard little-endian instruction endianness, the encoding diagram for an ARM instruction at address A shows the bytes at addressees A+3, A+2, A+1, A from left to right, but the encoding diagram for a 32-bit Thumb instruction shows them in the order A+1, A for the first halfword, followed by A+3, A+2 for the second halfword. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-3 Instruction Details • Encoding-specific pseudocode. This is pseudocode that translates the encoding-specific instruction fields into inputs to the encoding-independent pseudocode in the later Operation subsection, and that picks out any special cases in the encoding. For a detailed description of the pseudocode used and of the relationship between the encoding diagram, the encoding-specific pseudocode and the encoding-independent pseudocode, see Appendix I Pseudocode Definition. A8.1.4 Assembler syntax The Assembly syntax subsection describes the standard UAL syntax for the instruction. Each syntax description consists of the following elements: • One or more syntax prototype lines written in a typewriter font, using the conventions described in Assembler syntax prototype line conventions on page A8-5. Each prototype line documents the mnemonic and (where appropriate) operand parts of a full line of assembler code. When there is more than one such line, each prototype line is annotated to indicate required results of the encoding-specific pseudocode. For each instruction encoding, this information can be used to determine whether any instructions matching that encoding are available when assembling that syntax, and if so, which ones. • The line where: followed by descriptions of all of the variable or optional fields of the prototype syntax line. Some syntax fields are standardized across all or most instructions. Standard assembler syntax fields on page A8-7 describes these fields. By default, syntax fields that specify registers, such as , , or , can be any of R0-R12 or LR in Thumb instructions, and any of R0-R12, SP or LR in ARM instructions. These require that the encoding-specific pseudocode set the corresponding integer variable (such as d, n, or t) to the corresponding register number (0-12 for R0-R12, 13 for SP, 14 for LR). This can normally be done by setting the corresponding bitfield in the instruction (named Rd, Rn, Rt…) to the binary encoding of that number. In the case of 16-bit Thumb encodings, this bitfield is normally of length 3 and so the encoding is only available when one of R0-R7 is specified in the assembler syntax. It is also common for such encodings to use a bitfield name such as Rdn. This indicates that the encoding is only available if and specify the same register, and that the register number of that register is encoded in the bitfield if they do. The description of a syntax field that specifies a register sometimes extends or restricts the permitted range of registers or documents other differences from the default rules for such fields. Typical extensions are to permit the use of the SP in Thumb instructions and to permit the use of the PC (using register number 15). • Where appropriate, text that briefly describes changes from the pre-UAL ARM assembler syntax. Where present, this usually consists of an alternative pre-UAL form of the assembler mnemonic. The pre-UAL ARM assembler syntax does not conflict with UAL, and support for it is a recommended optional extension to UAL, to enable the assembly of pre-UAL ARM assembler source files. A8-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Note The pre-UAL Thumb assembler syntax is incompatible with UAL and is not documented in the instruction sections. For details see Appendix C Legacy Instruction Mnemonics. Assembler syntax prototype line conventions The following conventions are used in assembler syntax prototype lines and their subfields: <> Any item bracketed by < and > is a short description of a type of value to be supplied by the user in that position. A longer description of the item is normally supplied by subsequent text. Such items often correspond to a similarly named field in an encoding diagram for an instruction. When the correspondence simply requires the binary encoding of an integer value or register number to be substituted into the instruction encoding, it is not described explicitly. For example, if the assembler syntax for an ARM instruction contains an item and the instruction encoding diagram contains a 4-bit field named Rn, the number of the register specified in the assembler syntax is encoded in binary in the instruction field. If the correspondence between the assembler syntax item and the instruction encoding is more complex than simple binary encoding of an integer or register number, the item description indicates how it is encoded. This is often done by specifying a required output from the encoding-specific pseudocode, such as add = TRUE. The assembler must only use encodings that produce that output. {} Any item bracketed by { and } is optional. A description of the item and of how its presence or absence is encoded in the instruction is normally supplied by subsequent text. Many instructions have an optional destination register. Unless otherwise stated, if such a destination register is omitted, it is the same as the immediately following source register in the instruction syntax. spaces Single spaces are used for clarity, to separate items. When a space is obligatory in the assembler syntax, two or more consecutive spaces are used. +/- This indicates an optional + or - sign. If neither is coded, + is assumed. All other characters must be encoded precisely as they appear in the assembler syntax. Apart from { and }, the special characters described above do not appear in the basic forms of assembler instructions documented in this manual. The { and } characters need to be encoded in a few places as part of a variable item. When this happens, the long description of the variable item indicates how they must be used. A8.1.5 Pseudocode describing how the instruction operates The Operation subsection contains encoding-independent pseudocode that describes the main operation of the instruction. For a detailed description of the pseudocode used and of the relationship between the encoding diagram, the encoding-specific pseudocode and the encoding-independent pseudocode, see Appendix I Pseudocode Definition. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-5 Instruction Details A8.1.6 Exception information The Exceptions subsection contains a list of the exceptional conditions that can be caused by execution of the instruction. Processor exceptions are listed as follows: • Resets and interrupts (both IRQs and FIQs) are not listed. They can occur before or after the execution of any instruction, and in some cases during the execution of an instruction, but they are not in general caused by the instruction concerned. • Prefetch Abort exceptions are normally caused by a memory abort when an instruction is fetched, followed by an attempt to execute that instruction. This can happen for any instruction, but is caused by the aborted attempt to fetch the instruction rather than by the instruction itself, and so is not listed. A special case is the BKPT instruction, that is defined as causing a Prefetch Abort exception in some circumstances. • Data Abort exceptions are listed for all instructions that perform data memory accesses. • Undefined Instruction exceptions are listed when they are part of the effects of a defined instruction. For example, all coprocessor instructions are defined to produce the Undefined Instruction exception if not accepted by their coprocessor. Undefined Instruction exceptions caused by the execution of an UNDEFINED instruction are not listed, even when the UNDEFINED instruction is a special case of one or more of the encodings of the instruction. Such special cases are instead indicated in the encoding-specific pseudocode for the encoding. • Supervisor Call and Secure Monitor Call exceptions are listed for the SVC and SMC instructions respectively. Supervisor Call exceptions and the SVC instruction were previously called Software Interrupt exceptions and the SWI instruction. Secure Monitor Call exceptions and the SMC instruction were previously called Secure Monitor interrupts and the SMI instruction. Floating-point exceptions are listed for instructions that can produce them. Floating-point exceptions on page A2-42 describes these exceptions. They do not normally result in processor exceptions. A8.1.7 Notes Where appropriate, other notes about the instruction appear under additional subheadings. Note Information that was documented in notes in previous versions of the ARM Architecture Reference Manual and its supplements has often been moved elsewhere. For example, operand restrictions on the values of bitfields in an instruction encoding are now normally documented in the encoding-specific pseudocode for that encoding. A8-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details A8.2 Standard assembler syntax fields The following assembler syntax fields are standard across all or most instructions: Is an optional field. It specifies the condition under which the instruction is executed. See Conditional execution on page A8-8 for the range of available conditions and their encoding. If is omitted, it defaults to always (AL). Specifies optional assembler qualifiers on the instruction. The following qualifiers are defined: .N Meaning narrow, specifies that the assembler must select a 16-bit encoding for the instruction. If this is not possible, an assembler error is produced. .W Meaning wide, specifies that the assembler must select a 32-bit encoding for the instruction. If this is not possible, an assembler error is produced. If neither .W nor .N is specified, the assembler can select either 16-bit or 32-bit encodings. If both are available, it must select a 16-bit encoding. In a few cases, more than one encoding of the same length can be available for an instruction. The rules for selecting between such encodings are instruction-specific and are part of the instruction description. Note When assembling to the ARM instruction set, the .N qualifier produces an assembler error and the .W qualifier has no effect. Although the instruction descriptions throughout this manual show the and fields without { } around them, these fields are optional as described in this section. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-7 Instruction Details A8.3 Conditional execution Most ARM instructions, and most Thumb instructions from ARMv6T2 onwards, can be executed conditionally, based on the values of the APSR condition flags. Before ARMv6T2, the only conditional Thumb instruction was the 16-bit conditional branch instruction. Table A8-1 lists the available conditions. In Thumb instructions, the condition (if it is not AL) is normally encoded in a preceding IT instruction. For details see Conditional instructions on page A4-4 and IT on page A8-104. Some conditional branch instructions do not require a preceding IT instruction, and include a condition code in their encoding. In ARM instructions, bits [31:28] of the instruction contain the condition, or contain 1111 for some ARM instructions that can only be executed unconditionally. Table A8-1 Condition codes cond Mnemonic extension Meaning (integer) Meaning (floating-point) a Condition flags 0000 EQ Equal Equal Z == 1 0001 NE Not equal Not equal, or unordered Z == 0 0010 CS b Carry set Greater than, equal, or unordered C == 1 0011 CC c Carry clear Less than C == 0 0100 MI Minus, negative Less than N == 1 0101 PL Plus, positive or zero Greater than, equal, or unordered N == 0 0110 VS Overflow Unordered V == 1 0111 VC No overflow Not unordered V == 0 1000 HI Unsigned higher Greater than, or unordered C == 1 and Z == 0 1001 LS Unsigned lower or same Less than or equal C == 0 or Z == 1 1010 GE Signed greater than or equal Greater than or equal N == V 1011 LT Signed less than Less than, or unordered N != V 1100 GT Signed greater than Greater than Z == 0 and N == V 1101 LE Signed less than or equal Less than, equal, or unordered Z == 1 or N != V 1110 None (AL) d Always (unconditional) Always (unconditional) Any a. Unordered means at least one NaN operand. b. HS (unsigned higher or same) is a synonym for CS. c. LO (unsigned lower) is a synonym for CC. d. AL is an optional mnemonic extension for always, except in IT instructions. For details see IT on page A8-104. A8-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details A8.3.1 Pseudocode details of conditional execution The CurrentCond() pseudocode function has prototype: bits(4) CurrentCond() and returns a 4-bit condition specifier as follows: • For ARM instructions, it returns bits[31:28] of the instruction. • For the T1 and T3 encodings of the Branch instruction (see B on page A8-44), it returns the 4-bit 'cond' field of the encoding. • For all other Thumb and ThumbEE instructions, it returns ITSTATE.IT<7:4>. See ITSTATE on page A2-17. The ConditionPassed() function uses this condition specifier and the APSR condition flags to determine whether the instruction must be executed: // ConditionPassed() // ================= boolean ConditionPassed() cond = CurrentCond(); // Evaluate base condition. case cond<3:1> of when ‘000’ result = (APSR.Z == ‘1’); when ‘001’ result = (APSR.C == ‘1’); when ‘010’ result = (APSR.N == ‘1’); when ‘011’ result = (APSR.V == ‘1’); when ‘100’ result = (APSR.C == ‘1’) && (APSR.Z == ‘0’); when ‘101’ result = (APSR.N == APSR.V); when ‘110’ result = (APSR.N == APSR.V) && (APSR.Z == ‘0’); when ‘111’ result = TRUE; // EQ or NE // CS or CC // MI or PL // VS or VC // HI or LS // GE or LT // GT or LE // AL // Condition bits ‘111x’ indicate the instruction is always executed. Otherwise, // invert condition if necessary. if cond<0> == ‘1’ && cond != ‘1111’ then result = !result; return result; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-9 Instruction Details A8.4 Shifts applied to a register ARM register offset load/store word and unsigned byte instructions can apply a wide range of different constant shifts to the offset register. Both Thumb and ARM data-processing instructions can apply the same range of different constant shifts to the second operand register. For details see Constant shifts. ARM data-processing instructions can apply a register-controlled shift to the second operand register. A8.4.1 Constant shifts These are the same in Thumb and ARM instructions, except that the input bits come from different positions. is an optional shift to be applied to . It can be any one of: (omitted) No shift. LSL # Logical shift left bits. 1 <= <= 31. LSR # Logical shift right bits. 1 <= <= 32. ASR # Arithmetic shift right bits. 1 <= <= 32. ROR # Rotate right bits. 1 <= <= 31. RRX Rotate right one bit, with extend. Bit [0] is written to shifter_carry_out, bits [31:1] are shifted right one bit, and the Carry Flag is shifted into bit [31]. Note Assemblers can permit the use of some or all of ASR #0, LSL #0, LSR #0, and ROR #0 to specify that no shift is to be performed. This is not standard UAL, and the encoding selected for Thumb instructions might vary between UAL assemblers if it is used. To ensure disassembled code assembles to the original instructions, disassemblers must omit the shift specifier when the instruction specifies no shift. Similarly, assemblers can permit the use of #0 in the immediate forms of ASR, LSL, LSR, and ROR instructions to specify that no shift is to be performed, that is, that a MOV (register) instruction is wanted. Again, this is not standard UAL, and the encoding selected for Thumb instructions might vary between UAL assemblers if it is used. To ensure disassembled code assembles to the original instructions, disassemblers must use the MOV (register) syntax when the instruction specifies no shift. A8-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Encoding The assembler encodes into two type bits and five immediate bits, as follows: (omitted) type = 0b00, immediate = 0. LSL # type = 0b00, immediate = . LSR # type = 0b01. If < 32, immediate = . If == 32, immediate = 0. ASR # type = 0b10. If < 32, immediate = . If == 32, immediate = 0. ROR # type = 0b11, immediate = . RRX type = 0b11, immediate = 0. A8.4.2 Register controlled shifts These are only available in ARM instructions. is the type of shift to apply to the value read from . It must be one of: ASR Arithmetic shift right, encoded as type = 0b10 LSL Logical shift left, encoded as type = 0b00 LSR Logical shift right, encoded as type = 0b01 ROR Rotate right, encoded as type = 0b11. The bottom byte of contains the shift amount. A8.4.3 Pseudocode details of instruction-specified shifts and rotates enumeration SRType (SRType_LSL, SRType_LSR, SRType_ASR, SRType_ROR, SRType_RRX); // DecodeImmShift() // ================ (SRType, integer) DecodeImmShift(bits(2) type, bits(5) imm5) case type of when ‘00’ shift_t = SRType_LSL; when ‘01’ shift_t = SRType_LSR; when ‘10’ shift_t = SRType_ASR; when ‘11’ shift_n = UInt(imm5); shift_n = if imm5 == ‘00000’ then 32 else UInt(imm5); shift_n = if imm5 == ‘00000’ then 32 else UInt(imm5); ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-11 Instruction Details if imm5 == ‘00000’ then shift_t = SRType_RRX; else shift_t = SRType_ROR; shift_n = 1; shift_n = UInt(imm5); return (shift_t, shift_n); // DecodeRegShift() // ================ SRType DecodeRegShift(bits(2) type) case type of when ‘00’ shift_t = SRType_LSL; when ‘01’ shift_t = SRType_LSR; when ‘10’ shift_t = SRType_ASR; when ‘11’ shift_t = SRType_ROR; return shift_t; // Shift() // ======= bits(N) Shift(bits(N) value, SRType type, integer amount, bit carry_in) (result, -) = Shift_C(value, type, amount, carry_in); return result; // Shift_C() // ========= (bits(N), bit) Shift_C(bits(N) value, SRType type, integer amount, bit carry_in) assert !(type == SRType_RRX && amount != 1); if amount == 0 then (result, carry_out) = (value, carry_in); else case type of when SRType_LSL (result, carry_out) = LSL_C(value, amount); when SRType_LSR (result, carry_out) = LSR_C(value, amount); when SRType_ASR (result, carry_out) = ASR_C(value, amount); when SRType_ROR (result, carry_out) = ROR_C(value, amount); when SRType_RRX (result, carry_out) = RRX_C(value, carry_in); return (result, carry_out); A8-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details A8.5 Memory accesses Commonly, the following addressing modes are permitted for memory access instructions: Offset addressing The offset value is applied to an address obtained from the base register. The result is used as the address for the memory access. The value of the base register is unchanged. The assembly language syntax for this mode is: [,] Pre-indexed addressing The offset value is applied to an address obtained from the base register. The result is used as the address for the memory access, and written back into the base register. The assembly language syntax for this mode is: [,]! Post-indexed addressing The address obtained from the base register is used, unchanged, as the address for the memory access. The offset value is applied to the address, and written back into the base register The assembly language syntax for this mode is: [], In each case, is the base register. can be: • an immediate constant, such as or • an index register, • a shifted index register, such as , LSL #. For information about unaligned access, endianness, and exclusive access, see: • Alignment support on page A3-4 • Endian support on page A3-7 • Synchronization and semaphores on page A3-12. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-13 Instruction Details A8.6 Alphabetical list of instructions Every instruction is listed in this section. For details of the format used see Format of instruction descriptions on page A8-2. A8.6.1 ADC (immediate) Add with Carry (immediate) adds an immediate value and the carry flag value to a register value, and writes the result to the destination register. It can optionally update the condition flags based on the result. Encoding T1 ARMv6T2, ARMv7 ADC{S} ,,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 i 0 1 0 1 0 S Rn 0 imm3 Rd imm8 d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); imm32 = ThumbExpandImm(i:imm3:imm8); if BadReg(d) || BadReg(n) then UNPREDICTABLE; Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADC{S} ,,# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 1 0 1 0 1 S Rn Rd imm12 if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); imm32 = ARMExpandImm(imm12); A8-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADC{S} {,} , # where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. The immediate value to be added to the value obtained from . See Modified immediate constants in Thumb instructions on page A6-17 or Modified immediate constants in ARM instructions on page A5-9 for the range of values. The pre-UAL syntax ADCS is equivalent to ADCS. Operation if ConditionPassed() then EncodingSpecificOperations(); (result, carry, overflow) = AddWithCarry(R[n], imm32, APSR.C); if d == 15 then // Can only occur for ARM encoding ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-15 Instruction Details A8.6.2 ADC (register) Add with Carry (register) adds a register value, the carry flag value, and an optionally-shifted register value, and writes the result to the destination register. It can optionally update the condition flags based on the result. Encoding T1 ADCS , ADC , ARMv4T, ARMv5T*, ARMv6*, ARMv7 Outside IT block. Inside IT block. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 0 0 0 0 0 1 0 1 Rm Rdn d = UInt(Rdn); n = UInt(Rdn); m = UInt(Rm); setflags = !InITBlock(); (shift_t, shift_n) = (SRType_LSL, 0); Encoding T2 ARMv6T2, ARMv7 ADC{S}.W ,,{,} 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 0 1 0 1 1 0 1 0 S Rn (0) imm3 Rd imm2 type Rm d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm3:imm2); if BadReg(d) || BadReg(n) || BadReg(m) then UNPREDICTABLE; Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADC{S} ,,{,} 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 0 1 0 1 S Rn Rd imm5 type 0 Rm if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm5); A8-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADC{S} {,} , {,} where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. The optionally shifted second operand register. The shift to apply to the value read from . If present, encoding T1 is not permitted. If absent, no shift is applied and any encoding is permitted. Shifts applied to a register on page A8-10 describes the shifts and how they are encoded. In Thumb assembly: • outside an IT block, if ADCS ,, has and both in the range R0-R7, it is assembled using encoding T1 as though ADCS , had been written. • inside an IT block, if ADC ,, has and both in the range R0-R7, it is assembled using encoding T1 as though ADC , had been written. To prevent either of these happening, use the .W qualifier. The pre-UAL syntax ADCS is equivalent to ADCS. Operation if ConditionPassed() then EncodingSpecificOperations(); shifted = Shift(R[m], shift_t, shift_n, APSR.C); (result, carry, overflow) = AddWithCarry(R[n], shifted, APSR.C); if d == 15 then // Can only occur for ARM encoding ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-17 Instruction Details A8.6.3 ADC (register-shifted register) Add with Carry (register-shifted register) adds a register value, the carry flag value, and a register-shifted register value. It writes the result to the destination register, and can optionally update the condition flags based on the result. Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADC{S} ,,, 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 0 1 0 1 S Rn Rd Rs 0 type 1 Rm d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); s = UInt(Rs); setflags = (S == ‘1’); shift_t = DecodeRegShift(type); if d == 15 || n == 15 || m == 15 || s == 15 then UNPREDICTABLE; A8-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADC{S} {,} , , where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. The register that is shifted and used as the second operand. The type of shift to apply to the value read from . It must be one of: ASR Arithmetic shift right, encoded as type = 0b10 LSL Logical shift left, encoded as type = 0b00 LSR Logical shift right, encoded as type = 0b01 ROR Rotate right, encoded as type = 0b11. The register whose bottom byte contains the amount to shift by. The pre-UAL syntax ADCS is equivalent to ADCS. Operation if ConditionPassed() then EncodingSpecificOperations(); shift_n = UInt(R[s]<7:0>); shifted = Shift(R[m], shift_t, shift_n, APSR.C); (result, carry, overflow) = AddWithCarry(R[n], shifted, APSR.C); R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-19 Instruction Details A8.6.4 ADD (immediate, Thumb) This instruction adds an immediate value to a register value, and writes the result to the destination register. It can optionally update the condition flags based on the result. Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADDS ,,# Outside IT block. ADD ,,# Inside IT block. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 1 1 1 0 imm3 Rn Rd d = UInt(Rd); n = UInt(Rn); setflags = !InITBlock(); imm32 = ZeroExtend(imm3, 32); Encoding T2 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADDS ,# Outside IT block. ADD ,# Inside IT block. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 1 1 0 Rdn imm8 d = UInt(Rdn); n = UInt(Rdn); setflags = !InITBlock(); imm32 = ZeroExtend(imm8, 32); Encoding T3 ARMv6T2, ARMv7 ADD{S}.W ,,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 i 0 1 0 0 0 S Rn 0 imm3 Rd imm8 if Rd == ‘1111’ && S == ‘1’ then SEE CMN (immediate); if Rn == ‘1101’ then SEE ADD (SP plus immediate); d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); imm32 = ThumbExpandImm(i:imm3:imm8); if BadReg(d) || n == 15 then UNPREDICTABLE; Encoding T4 ARMv6T2, ARMv7 ADDW ,,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 i 1 0 0 0 0 0 Rn 0 imm3 Rd imm8 if Rn == ‘1111’ then SEE ADR; if Rn == ‘1101’ then SEE ADD (SP plus immediate); d = UInt(Rd); n = UInt(Rn); setflags = FALSE; imm32 = ZeroExtend(i:imm3:imm8, 32); if BadReg(d) then UNPREDICTABLE; A8-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} {,} , # ADDW {,} , # All encodings permitted Only encoding T4 permitted where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. If is SP, see ADD (SP plus immediate) on page A8-28. If is PC, see ADR on page A8-32. The immediate value to be added to the value obtained from . The range of values is 0-7 for encoding T1, 0-255 for encoding T2 and 0-4095 for encoding T4. See Modified immediate constants in Thumb instructions on page A6-17 for the range of values for encoding T3. When multiple encodings of the same length are available for an instruction, encoding T3 is preferred to encoding T4 (if encoding T4 is required, use the ADDW syntax). Encoding T1 is preferred to encoding T2 if is specified and encoding T2 is preferred to encoding T1 if is omitted. The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); (result, carry, overflow) = AddWithCarry(R[n], imm32, ‘0’); R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-21 Instruction Details A8.6.5 ADD (immediate, ARM) This instruction adds an immediate value to a register value, and writes the result to the destination register. It can optionally update the condition flags based on the result. Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADD{S} ,,# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 1 0 1 0 0 S Rn Rd imm12 if Rn == ‘1111’ && S == ‘0’ then SEE ADR; if Rn == ‘1101’ then SEE ADD (SP plus immediate); if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); imm32 = ARMExpandImm(imm12); A8-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} {,} , # where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. If the SP is specified for , see ADD (SP plus immediate) on page A8-28. If the PC is specified for , see ADR on page A8-32. The immediate value to be added to the value obtained from . See Modified immediate constants in ARM instructions on page A5-9 for the range of values. The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); (result, carry, overflow) = AddWithCarry(R[n], imm32, ‘0’); if d == 15 then ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-23 Instruction Details A8.6.6 ADD (register) This instruction adds a register value and an optionally-shifted register value, and writes the result to the destination register. It can optionally update the condition flags based on the result. Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADDS ,, Outside IT block. ADD ,, Inside IT block. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 1 1 0 0 Rm Rn Rd d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = !InITBlock(); (shift_t, shift_n) = (SRType_LSL, 0); Encoding T2 ARMv6T2, ARMv7 if and are both from R0-R7 ARMv4T, ARMv5T*, ARMv6*, ARMv7 otherwise ADD , If is the PC, must be outside or last in IT block. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 0 0 0 1 0 0 DN Rm Rdn if (DN:Rdn) == ‘1101’ || Rm == ‘1101’ then SEE ADD (SP plus register); d = UInt(DN:Rdn); n = d; m = UInt(Rm); setflags = FALSE; (shift_t, shift_n) = (SRType_LSL, 0); if n == 15 && m == 15 then UNPREDICTABLE; if d == 15 && InITBlock() && !LastInITBlock() then UNPREDICTABLE; Encoding T3 ARMv6T2, ARMv7 ADD{S}.W ,,{,} 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 0 1 0 1 1 0 0 0 S Rn (0) imm3 Rd imm2 type Rm if Rd == ‘1111’ && S == ‘1’ then SEE CMN (register); if Rn == ‘1101’ then SEE ADD (SP plus register); d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm3:imm2); if BadReg(d) || n == 15 || BadReg(m) then UNPREDICTABLE; Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADD{S} ,,{,} 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 0 1 0 0 S Rn Rd imm5 type 0 Rm if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; if Rn == ‘1101’ then SEE ADD (SP plus register); d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm5); A8-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} {,} , {,} where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. If omitted, is the same as and encoding T2 is preferred to encoding T1 inside an IT block. If is present, encoding T1 is preferred to encoding T2. The first operand register. If is SP, see ADD (SP plus register) on page A8-30. The register that is optionally shifted and used as the second operand. The shift to apply to the value read from . If present, only encoding T3 or A1 is permitted. If omitted, no shift is applied and any encoding is permitted. Shifts applied to a register on page A8-10 describes the shifts and how they are encoded. In Thumb assembly, inside an IT block, if ADD ,, cannot be assembled using encoding T1, it is assembled using encoding T2 as though ADD , had been written. To prevent this happening, use the .W qualifier. The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); shifted = Shift(R[m], shift_t, shift_n, APSR.C); (result, carry, overflow) = AddWithCarry(R[n], shifted, ‘0’); if d == 15 then ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-25 Instruction Details A8.6.7 ADD (register-shifted register) Add (register-shifted register) adds a register value and a register-shifted register value. It writes the result to the destination register, and can optionally update the condition flags based on the result. Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADD{S} ,,, 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 0 1 0 0 S Rn Rd Rs 0 type 1 Rm d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); s = UInt(Rs); setflags = (S == ‘1’); shift_t = DecodeRegShift(type); if d == 15 || n == 15 || m == 15 || s == 15 then UNPREDICTABLE; A8-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} {,} , , where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. The register that is shifted and used as the second operand. The type of shift to apply to the value read from . It must be one of: ASR Arithmetic shift right, encoded as type = 0b10 LSL Logical shift left, encoded as type = 0b00 LSR Logical shift right, encoded as type = 0b01 ROR Rotate right, encoded as type = 0b11. The register whose bottom byte contains the amount to shift by. The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); shift_n = UInt(R[s]<7:0>); shifted = Shift(R[m], shift_t, shift_n, APSR.C); (result, carry, overflow) = AddWithCarry(R[n], shifted, ‘0’); R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-27 Instruction Details A8.6.8 ADD (SP plus immediate) This instruction adds an immediate value to the SP value, and writes the result to the destination register. Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADD ,SP,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 1 0 1 Rd imm8 d = UInt(Rd); setflags = FALSE; imm32 = ZeroExtend(imm8:’00’, 32); Encoding T2 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADD SP,SP,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 101100000 imm7 d = 13; setflags = FALSE; imm32 = ZeroExtend(imm7:’00’, 32); Encoding T3 ARMv6T2, ARMv7 ADD{S}.W ,SP,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 i 0 1 0 0 0 S 1 1 0 1 0 imm3 Rd imm8 if Rd == ‘1111’ && S == ‘1’ then SEE CMN (immediate); d = UInt(Rd); setflags = (S == ‘1’); imm32 = ThumbExpandImm(i:imm3:imm8); if d == 15 then UNPREDICTABLE; Encoding T4 ARMv6T2, ARMv7 ADDW ,SP,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 i 1 0 0 0 0 0 1 1 0 1 0 imm3 Rd imm8 d = UInt(Rd); setflags = FALSE; imm32 = ZeroExtend(i:imm3:imm8, 32); if d == 15 then UNPREDICTABLE; Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADD{S} ,SP,# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 1 0 1 0 0 S 1 1 0 1 Rd imm12 if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; d = UInt(Rd); setflags = (S == ‘1’); imm32 = ARMExpandImm(imm12); A8-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} {,} SP, # ADDW {,} SP, # All encodings permitted Only encoding T4 is permitted where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. If omitted, is SP. The immediate value to be added to the value obtained from SP. Values are multiples of 4 in the range 0-1020 for encoding T1, multiples of 4 in the range 0-508 for encoding T2 and any value in the range 0-4095 for encoding T4. See Modified immediate constants in Thumb instructions on page A6-17 or Modified immediate constants in ARM instructions on page A5-9 for the range of values for encodings T3 and A1. When both 32-bit encodings are available for an instruction, encoding T3 is preferred to encoding T4 (if encoding T4 is required, use the ADDW syntax). The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); (result, carry, overflow) = AddWithCarry(SP, imm32, ‘0’); if d == 15 then // Can only occur for ARM encoding ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-29 Instruction Details A8.6.9 ADD (SP plus register) This instruction adds an optionally-shifted register value to the SP value, and writes the result to the destination register. Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADD , SP, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 0 0 0 1 0 0 DM 1 1 0 1 Rdm d = UInt(DM:Rdm); m = UInt(DM:Rdm); setflags = FALSE; (shift_t, shift_n) = (SRType_LSL, 0); Encoding T2 ADD SP, ARMv4T, ARMv5T*, ARMv6*, ARMv7 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 0 0 0 1 0 0 1 Rm 1 0 1 if Rm == ‘1101’ then SEE encoding T1; d = 13; m = UInt(Rm); setflags = FALSE; (shift_t, shift_n) = (SRType_LSL, 0); Encoding T3 ARMv6T2, ARMv7 ADD{S}.W ,SP,{,} 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 0 1 0 1 1 0 0 0 S 1 1 0 1 0 imm3 Rd imm2 type Rm d = UInt(Rd); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm3:imm2); if d == 13 && (shift_t != SRType_LSL || shift_n > 3) then UNPREDICTABLE; if d == 15 || BadReg(m) then UNPREDICTABLE; Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADD{S} ,SP,{,} 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond 0 0 0 0 1 0 0 S 1 1 0 1 Rd imm5 type 0 Rm if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; d = UInt(Rd); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm5); A8-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} {,} SP, {, } where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. This register can be SP. If omitted, is SP. This register can be the PC, but if it is, encoding T3 is not permitted. Using the PC is deprecated. The register that is optionally shifted and used as the second operand. This register can be the PC, but if it is, encoding T3 is not permitted. Using the PC is deprecated. This register can be SP in both ARM and Thumb instructions, but: • the use of SP is deprecated • when assembling for the Thumb instruction set, only encoding T1 is available and so the instruction can only be ADD SP,SP,SP. The shift to apply to the value read from . If omitted, no shift is applied and any encoding is permitted. If present, only encoding T3 or A1 is permitted. Shifts applied to a register on page A8-10 describes the shifts and how they are encoded. In the Thumb instruction set, if is SP or omitted, is only permitted to be omitted, LSL #1, LSL #2, or LSL #3. The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); shifted = Shift(R[m], shift_t, shift_n, APSR.C); (result, carry, overflow) = AddWithCarry(SP, shifted, ‘0’); if d == 15 then ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-31 Instruction Details A8.6.10 ADR This instruction adds an immediate value to the PC value to form a PC-relative address, and writes the result to the destination register. Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADR ,