首页资源分类嵌入式系统 > ︿stm32f4浮点单元

︿stm32f4浮点单元

已有 445117个资源

下载专区

上传者其他资源

文档信息举报收藏

标    签:浮点单元

分    享:

文档简介

stm32的浮点单元英文介绍

文档预览

STM32F4 Core, DSP, FPU & Library A practical introduction to fixed / floating point A practical introduction of the floating point unit Tips & comments on floating points usage Content  A practical introduction to Fixed/Floating points  A practical introduction to the floating point unit  Tips & comments on floating points usage 2 Content  A practical introduction to Fixed/Floating points  A practical introduction to the floating point unit  Tips & comments on floating points usage 3 What are these? Half precision Q15 QNaN Q31 -∞ De-Normalized +∞ NaN Qx.y integer double precision Binary16 +0 IEEE754 Binary32 Normalized -0 SNaN Binary64 single precision 4 Half / single / double precision  Half precision : 16-bits coding (called binary16 in IEEE754-2008) 1-bit Sign 5-bit Exponent 10-bits Mantissa  Single precision : 32-bits coding (called binary32 in IEEE754-2008) 1-bit Sign 8-bits Exponent 23-bits Mantissa  Double precision : 64-bits coding (called binary64 in IEEE754-2008)  1-bit Sign 11-bits Exponent 52-bits Mantissa 5 Let’s compare 8 bits formats Integers format  unsigned integer  Signed integer 1-bit Sign Unsigned Represented value = 8bits integer part Signed Represented value = (-1)sign ×7bitsinteger part 8-bit integer part 7-bit integer part Fixed point format Qx.y  Q4.3 format  Q0.7 format 4-bit integer part x=4 1-bit Sign 3-bit factional part y=3 Q4.3 Represented value = (-1)sign × 2-3 × 7bits integer part Q0.7 Represented value = (-1)sign × 2-7 × 7bits integer part 7-bit factional part y=7 Floating point format  IEEE754 Like  Non IEEE754 Like 1-bit Sign 4-bit Exponent 3-bit Mantissa IEEE754 Like represented Normalized value : (-1)sign x 2(exponent – bias ) x mantissa IEEE754 Like represented DeNormalized value : (-1)sign x 2( 1- bias ) x mantissa (also called subnormal) Note : this 8bits floating point format is not standard, it is used for illustration purpose 6 8bits formats comparison Looking a the range -260 to +260 -260 -160 -60 40 140 IEEE754 like (8bits) Fixed point Q0.7 Fixed point Q4.3 Signed integers Unsigned integers 240 Looking at the range -5 to 5 IEEE754 like (8bits) Fixed point Q0.7 Fixed point Q4.3 Signed integers -5 -4 -3 -2 -1 0 1 2 3 4 5 Unsigned integers Note : All these formats have 256 dicrete values, only the repartition is different 7 8bits formats comparison (continued) Looking a the range -1 to +1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 IEEE754 like (8bits) Fixed point Q0.7 Fixed point Q4.3 Signed integers Unsigned integers 1 Looking at the range -0.1 to 0.1 IEEE754 like (8bits) Fixed point Q0.7 Fixed point Q4.3 Signed integers Unsigned integers -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 8 Summary of IEEE 754 number coding The IEEE754-2008 standard defines theses formats: Format Binary16 / Half precision Binary32 / Single precision Binary64 / Double Precision Sign 1bit 1bit 1bit Exponent 5bits 8bits 11bits Mantissa 10bits (+1 implied bit for normalized numbers) 23bits (+1 implied bit for normalized numbers) 52bits (+1 implied bit for normalized numbers) Normalized / Denormalized numbers Sign - Exponent 0 [1, Max-1] Mantissa !=0 - IEEE754-2008 De-normalized number (mantissa without implied MSB) Normalized number (mantissa with one implied MSB) Each of the format contains special numbers Sign 0 1 0 1 - - Exponent 0 0 Max Max Max Max Mantissa 0 0 0 0 !=0 MSB=1 !=0 MSB=0 IEEE754-2008 +0 -0 +infinity -infinity QNaN (Quiet Not a Number) SNaN (Signaling Not a Number) 9 Floating points : Rounding issues  The precision has some limits  Rounding errors can be accumulated along the various operations an may provide unaccurate results (do not do financial operations with floatings…)  Few examples  If you are working on two numbers in different base, the hardware automatically « denormalize » one of the two numbers to make the calculation in the same base  If you are substracting two numbers very closed you are loosing the relative precision (also called cancellation error)  If you are «reorganizing »the various operations, you may not obtain the same result as because of the rounding errors… 11 Content  A practical introduction to Fixed/Floating points  A practical introduction to the floating point unit  Tips & comments on floating points usage 12 Benefits of a Floating-Point Unit Time execution comparison for a 29 coefficient FIR on float 32 with and without FPU (CMSIS library) Execution Time 10x improvement Best compromise Development time vs. performance No FPU FPU 13 Code comparison with & without FPU float function1(float number1, float number2) { float temp1, temp2; temp1 = number1 + number2; temp2 = number1/temp1; return temp2; } Code compiled on Cortex-M3 # float function1(…) #{… # temp1 = number1 + number2; MOVS R1,R4 BL __aeabi_fadd MOVS R1,R0 # temp2 = number1/temp1; MOVS R0,R4 BL __aeabi_fdiv # return temp2; POP {R4,PC} #} Same code compiled on Cortex-M4F float function1(…) #{… # temp1 = number1 + number2; VADD.F32 S1,S0,S1 # temp2 = number1/temp1; VDIV.F32 S0,S0,S1 # # return temp2; BX LR #} FPU assembly instructions Call Soft-FPU (keil’s software library) 14 Cortex-M4 : Floating point unit Features  Single precision FPU  Conversion between  Integer numbers  Single precision floating point numbers  Half precision floating point numbers  Handling floating point exceptions (Untrapped)  Dedicated registers  16 single precision registers (S0-S15) which can be viewed as 16 Doubleword registers for load/store operations (D0-D7)  FPSCR for status & configuration 15 FPU instructions 16 FPU arithmetic instructions Operation Absolute value Negate Addition Subtract Multiply Multiply (fused) Divide Square-root Description of float float and multiply float floating point float float then accumulate float then subtract float then accumulate then negate float the subtract the negate float then accumulate float then subtract float then accumulate then negate float then subtract then negate float float of float Assembler VABS.F32 VNEG.F32 VNMUL.F32 VADD.F32 VSUB.F32 VMUL.F32 VMLA.F32 VMLS.F32 VNMLA.F32 VNMLS.F32 VFMA.F32 VFMS.F32 VFNMA.F32 VFNMS.F32 VDIV.F32 VSQRT.F32 Cycle 1 1 1 1 1 1 3 3 3 3 3 3 3 3 14 14 17 FPU Load/Store/Compare/Convert Operation Load Store Move Pop Push Compare Convert Description multiple doubles (N doubles) multiple floats (N floats) single double single float multiple double registers (N doubles) multiple float registers (N doubles) single double register single float register top/bottom half of double to/from core register immediate/float to float-register two floats/one double to/from core registers one float to/from core register floating-point control/status to core register core register to floating-point control/status double registers from stack float registers from stack double registers to stack float registers to stack float with register or zero float with register or zero between integer, fixed-point, half precision and float Assembler VLDM.64 VLDM.32 VLDR.64 VLDR.32 VSTM.64 VSTM.32 VSTR.64 VSTR.32 VMOV VMOV VMOV VMOV VMRS VMSR VPOP.64 VPOP.32 VPUSH.64 VPUSH.32 VCMP.F32 VCMPE.F32 VCVT.F32 Cycle 1+2*N 1+N 3 2 1+2*N 1+N 3 2 1 1 2 1 1 1 1+2*N 1+N 1+2*N 1+N 1 1 1 18 Important informations  The Floating point Unit IS compliant with IEEE754-2008  The Floating point unit does NOT support all operations of IEEE 754-2008  Unsupported operations  Remainder  Round FP number to integer-value FP number  Binary to decimal conversions  Decimal to binary conversions  Direct comparison of SP and DP values  Full implementation is done by software 19 IEEE754 compliancy The Cortex-M4 Floating Point Unit is IEEE754 compliant :  The rounding more is selected in the FPSCR register (nearest even value by default) Sign Exponent - 0 0 Max 1 Max - Max - Max Mantissa !=0 0 0 !=0 MSB=1 !=0 MSB=0 Compliant options Non compliant option FZ=0 and AHP=0 and DN=0 FZ=1 or AHP=1 or DN=1 De-normalized number Flush to zero +infinity Alternate Half Precision -infinity Alternate Half Precision QNaN (Quiet Not a Number) SNaN (Signaling Not a Number) Default NaN Alternate Half Precision Default NaN Alternate Half Precision Some non compliant options are available in the FPSCR Register:  Flush to zero (FZ bit) :  de-normalized numbers are flushed to zero  Alternate Half Precision formation (AHP bit):  special numbers (exp = all “1”) = normalized numbers  Default NaN (DN bit):  Different way to handle the Not An Number values 20 STM32F4 - Non IEEE754 compliant format Floating point 8bits (IEEE754 like) Floating point 8bits (Not IEEE754 like : FZ=1) Floating point 8bits (Not IEEE754 like : AHP=1) -500 -400 -300 -200 -100 0 100 200 300 400 500 Floating point 8bits (IEEE754 like) Floating point 8bits (Not IEEE754 like : FZ=1) Floating point 8bits (Not IEEE754 like : AHP=1) -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 These are simulation using an 8bits format representation :  Flush to zero (FZ bit) applies to 16bits (Half precision) & 32bits (single precision) formats  Alternate Half Precision format (AHP bit) applies to 16bits (Half precision) format only 21 STM32F4 - Floating point exceptions The FPU supports the 5 IEEE754 exceptions and adds a specific exception Invalid operation (IEEE754) Division by zero (IEEE754) Overflow (IEEE754) Underflow (IEEE754) Inexact (IEEE754) Input denormal ( Fluh to zero mode only)  These flags are in the FPSCR register  When flush to zero mode is used:  the FPU add a specific exception : input denormal  the FPU handles the underflow and Inexact exception in a non-IEEE754 way  The exception are not trapped  This is compliant with IEEE754  The value returned by the instruction generating an exception is a default result. Examples  1234 / 0 => division by zero flag is set / the returned value is +infinity  Sqrt(-1) => Invalid Operation flag is set / the returned value is QNaN Note: For details on each exception as well as the default returned value when such exceptions occurs, please refer to ARM-7M architecture reference manual 22 FPU programmers model Address Name Type Description 0xE000EF34 FPCCR RW 0xE000EF38 FPCAR RW 0xE000EF3C FPDSCR RW 0xE000EF40 MVFR0 RO 0xE000EF44 MVFR1 RO FP Context Control Register FP Context Address Register FP Default Status Control Register Media and VFP Feature Register 0 Media and VFP Feature Register 1  Floating-Point Context Control Register  Indicates the context when the FP stack frame has been allocated  Context preservation setting  Floating-Point Context Address Register  Points to the stack location reserved for S0  Floating-Point Default Status Control Register  Details default values for Alternative half-precision mode, Default NaN mode, Flush to zero mode and Rounding mode  Media & FP Feature Register 0 & 1  Details supported mode, instructions precision and and additional hardware support 23 About the Stack Frame There is a difference between the stack frame with or without FPU 0x1C 0x18 0x14 0x10 0x0C 0x08 0x04 0x00 xPSR ReturnAddress LR (R14) R12 R3 R2 R1 R0 Frame without FPU Basic Frame 0x64 0x60 0x5C … 0x20 0x1C 0x18 0x14 0x10 0x0C 0x08 0x04 0x00 Reserved FPSCR S15 … S0 xPSR ReturnAddress LR (R14) R12 R3 R2 R1 R0 Frame with FPU Extended Frame 24 About the Stack Frame Depending on the Floating-Point Context Control Register configuration, the core handle the stack in different ways Area reserved But registers are not pushed automaticaly Reserved Not stacked Not stacked … Not stacked Registers are pushed automatically Reserved FPSCR S15 … S0 xPSR xPSR xPSR ReturnAddress LR (R14) R12 R3 R2 ReturnAddress LR (R14) R12 R3 R2 ReturnAddress LR (R14) R12 R3 R2 R1 R1 R1 R0 R0 R0 ASPEN = 0 ASPEN = 1, LSPEN=1 ASPEN = 1, LSPEN=0 25 Lazy context save (default after reset) Reserved Not stacked Not stacked … Not stacked xPSR ReturnAddress LR (R14) R12 R3 R2 R1 R0 ASPEN = 1 LSPEN=1 In Lazy mode, the FP context is not saved  This reduces the exception latency.  While keeping it simple for the user to push the value if needed If a floating point instruction is needed, the ISR need :  To retrieve the address of the reserved area from the FPCAR register  To save the FP state, S0-S15 and the FPSCR,  sets the FPCCR.LSPACT bit to 0, to indicate that lazy state preservation is no longer active.  It can then processes the FPU instruction. 26 Content  A practical introduction to Fixed/Floating points  A practical introduction to the floating point unit  Tips & comments on floating points usage 27 What type to use ??? What is the difference between  double a = (double) 1.1234  double b = 1.1234  double c = (float) 1.1234  double d = 1.1234f  float a = (double) 1.1234  float b = 1.1234  float c = (float) 1.1234  float d = 1.1234f  float e = a + b  float f = a + b + (float) 1.1234  float f = a + b + 1.1234  float f = a + b + 1.1234f To avoid : - Compiler dependant behavior - Implicit conversions - the usage of an unexpected type - the use of double precision software library when intending to use Hardware FPU It is recommended to always explicitly specify the type using float a = (float) 1.234 float a = 1.234f double a = (double) 1.234 28 A practical example for rounding issue sp_a = 0.9999996f; sp_a += 0.0000001f; sp_a += 0.0000001f; sp_a += 0.0000001f; sp_a += 0.0000001f; sp_a += 0.0000001f; sp_a += 0.0000001f; sp_a += 0.0000001f; sp_b = 0.9999996f; sp_b += 0.0000007f; if (sp_b == sp_a) { sp_a =1;} else { sp_a =0;} sp_a = 0.9999996f; sp_a += 0.0000001f; sp_a += 0.0000001f; sp_a += 0.0000001f; sp_a += 0.0000001f; sp_b = 0.9999996f; sp_b += 0.0000004f; if (sp_b == sp_a) { sp_a =1;} else { sp_a =0;} Floats cannot be compared directly A better approach (but not perfect) if(sp_a-sp_b

Top_arrow
回到顶部
EEWORLD下载中心所有资源均来自网友分享,如有侵权,请发送举报邮件到客服邮箱bbs_service@eeworld.com.cn 或通过站内短信息或QQ:273568022联系管理员 高进,我们会尽快处理。