Design and Implementation of 32-Bit Inexact Floating Point Arithmetic Unit

In computing, floating-point format is an arithmetic formulaic representation of real numbers as an approximation so as to support a trade-off between range and precision. For this reason, floating-point computation is often found in systems which include very small and very large real numbers, which require fast processing times. In general floating point format is denoting as a mode of representing numbers as two sequences of bits, one representing the digits in the number called mantissa and the other an exponent which determines the position of the radix point. The traditional method of floating point arithmetic involves accurate computation for all applications. This traditional method of computing on floating point arithmetic requires high power. But power has become a key constraint in nano scale integrated circuit design due to the increasing demands for mobile computing and higher integration density. As an emerging computational paradigm, an inexact circuit offers a promising approach to significantly reduce both static and dynamic power dissipation for error tolerant applications. The objective of this project is to implement an inexact 32 bit binary floating point arithmetic which includes floating point adder, subtrctor and multiplier with improving performance. Here pipelined architecture is used in order to increase the performance and to increase the operating frequency. At the same time, the related logic includes both normalizer and the rounder according to the inexact mantissa and exponent parts. Floating point arithmetic is handled by the FP add, FP sub, FP mul. FPadd adds the value in the floating point accumulator to the floating point accumulator. FPsub subtracts the value in the floating point operand from the floating point accumulator. FP mul multiplies the value in the floating accumulator by the floating point operand. In this project,the proposed architecture is simulated and synthesized by Xilinx ISE 14.7. Keywords - Floating Point adder, Floating Point subtractor, Floating Point multiplier, Dadda multiplier