S. Khan, E. Casseau, and D. Menard, Reconfigurable SWP Operator for Multimedia Processing, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2009.
DOI : 10.1109/ASAP.2009.13

URL : https://hal.archives-ouvertes.fr/inria-00432572

S. Khan, E. Casseau, and D. Menard, SWP for multimedia operator design, Proceedings of the 2nd Colloque Nationale of GDR SoC-SIP, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00432578

S. Khan, E. Casseau, and D. Menard, SWP multimedia operator design, Proceeding of 5th international sciences of electronics, technologies of information and telecommunications conference (SETIT), 2009.
URL : https://hal.archives-ouvertes.fr/inria-00432578

D. Menard, E. Casseau, S. Khan, O. Sentieys, S. Chevobbe et al., Reconfigurable Operator Based Multimedia Embedded Processor, Proceedings of the International Workshop on Applied Reconfigurable Computing
DOI : 10.1109/79.826409

URL : https://hal.archives-ouvertes.fr/inria-00432566

S. Khan, E. Casseau, and D. Menard, High speed reconfigurable SWP operator for multimedia processing using redundant data representation, In International journal of information science and computer engineering, vol.1, issue.1, pp.45-52, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00480330

E. Casseau, S. Khan, and B. L. Gal, Multimode architecture design, Proceeding of Design and Architectures for Signal and Image Processing Workshop
URL : https://hal.archives-ouvertes.fr/hal-00204577

B. , L. Gal, E. Casseau, and S. Khan, HLS Design Flow for Multimode IP Generation Under Multiple Constraints, Proceeding of 14th IEEE International Conference on Electronics, Circuits and Systems (ICECS'07), 2007.
URL : https://hal.archives-ouvertes.fr/hal-00400051

A. Abdelgawad and M. Bayoumi, High Speed and Area-Efficient Multiply Accumulate (MAC) Unit for Digital Signal Processing Applications, Proceedings of the IEEE International Symposium on Circuits and Systems ISCAS, pp.3199-3202, 2007.

L. V. Agostini, I. S. Silva, and S. Bampi, Pipelined fast 2D DCT architecture for JPEG image compression, Symposium on Integrated Circuits and Systems Design, pp.226-231, 2001.
DOI : 10.1109/SBCCI.2001.953032

H. Amano, A Survey on Dynamically Reconfigurable Processors, Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Communications, pp.3179-3187, 2006.
DOI : 10.1093/ietcom/e89-b.12.3179

H. Amano, Y. Hasegawa, S. Tsutsumi, T. Nakamura, T. Nishimura et al., MuCCRA chips: Configurable Dynamically-Reconfigurable Processors, Proceedings of the IEEE Asian Solid- State Circuits Conference (ASSCC'07), pp.384-387, 2007.

N. B. Amor, Y. L. Moullec, J. P. Diguet, J. L. Philippe, and M. Abid, Design of a multimedia processor based on metrics computation, Advances in Engineering Software, pp.448-458, 2005.
DOI : 10.1016/j.advengsoft.2005.01.010

A. B. Attitalah, P. Kadionik, F. Ghozzi, P. Nouel, N. Masmoudi et al., Implementation of Loeffler Algorithm on Stratix DSP compared to Classical FPGA Solutions, Proceedings of the International Symposium on Communications, Control and Signal Processing (SCCSP), Morocco, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00183041

A. Avizienis, Signed-Digit Number Representations for Fast Parallel Arithmetic, Computer arithmetic, pp.389-400, 1961.

M. Bousselmi, M. S. Bouhlel, N. Masmoudi, and L. Kamoun, New parallel architecture of the DCT and its inverse for image compression, ICECS 2000. 7th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.00EX445), pp.345-348, 2000.
DOI : 10.1109/ICECS.2000.911552

C. Brunelli, F. Garzia, and J. Nurmi, A coarse-grain reconfigurable architecture for multimedia applications featuring subword computation capabilities, Proceedings of the Journal of real-time image processing, pp.21-32, 2008.
DOI : 10.1007/s11554-008-0071-3

C. Brunelli, P. Salmela, J. Takala, and J. Nurmi, A flexible multiplier for media processing, IEEE Workshop on Signal Processing Systems Design and Implementation, 2005., pp.70-74, 2005.
DOI : 10.1109/SIPS.2005.1579841

K. Bukhari, G. Kuzmanov, and S. Vassiliadis, DCT and IDCT Implementations on Different FPGA Technologies

A. M. Campos, F. J. Merelo, M. A. Peirot, and J. A. Esteve, Integerpixel motion estimation H.264/AVC accelerator architecture with optimal memory management, Proceedings of the International Journal of microprocessors and microsystems, pp.68-78, 2008.

E. Casseau, S. Khan, and B. L. Gal, Multimode architecture design, Proceeding of Design and Architectures for Signal and Image Processing Workshop (DASIP'07), 2007.
URL : https://hal.archives-ouvertes.fr/hal-00204577

S. Chatterjee and A. Chakrabarti, Parallel Hardware Design for Motion Estimation, In Proceedings of the International Journal of Recent Trends in Engineering, vol.1, pp.653-657, 2009.

M. O. Cheema and O. Hammami, Customized simd unit synthesis for system on programmable chip ~ a foundation for hw/sw partitioning with vectorization, Asia and South Pacific Conference on Design Automation, 2006.
DOI : 10.1109/ASPDAC.2006.1594645

J. Choi, N. Togawa, M. Yanagisawa, and T. Ohtsuki, VLSI architecture for a flexible motion estimation with parameters, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design, 2002.
DOI : 10.1109/ASPDAC.2002.994962

K. Compton and S. Hauck, Reconfigurable computing: a survey of systems and software, Proceedings of the ACM Computing Surveys, pp.171-210, 2002.
DOI : 10.1145/508352.508353

P. Corsonellol, S. Perri, M. A. Iachinol, and G. Cocorullo, Variable Precision Arithmetic Circuits for FPGA Based Multimedia Processors, IEEE Transactions on very large scale integration (VLSI) systems, 2004.

A. Danysh and D. Tan, Architecture and implementation of a vector/SIMD multiply-accumulate unit, Proceedings of the IEEE Computer Society, pp.284-293, 2005.
DOI : 10.1109/TC.2005.41

R. David, D. Chillet, S. Pillement, and O. Sentieys, DART A Dynamically Reconfigurable Architecture dealing with Next Generation Telecommunications Constraints, Proceedings of the Reconfigurable Architecture Workshop

K. Diefendorff, AltiVec extension to PowerPC accelerates media processing, Proceedings of the IEEE Micro, 2000.
DOI : 10.1109/40.848475

M. D. Ercegovac and T. Lang, Digital Arithmetic, 2003.
URL : https://hal.archives-ouvertes.fr/ensl-00542215

D. Esftathiou, J. Fridman, and Z. Zvonar, Recent developments in enabling technologies for the software defined radio, Proceedings of the IEEE Communication Magazine, pp.112-117, 1999.

A. A. Farooqui and V. G. Oklobdzija, General data-path organization of a MAC unit for VLSI implementation of DSP processors, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187), pp.260-263, 1998.
DOI : 10.1109/ISCAS.1998.706891

A. A. Farooqui, V. G. Oklobdzija, and F. Chechrazi, 64-Bit Media Adder, Proceedings of the IEEE International Symposium on Circuits and Systems, 1999.

J. Fridman, Sub-word parallelism in digital signal processing, IEEE signal processing magazine, pp.27-35, 2000.
DOI : 10.1109/79.826409

J. Fridman, Data alignment for sub-word parallelism in DSP, 1999 IEEE Workshop on Signal Processing Systems. SiPS 99. Design and Implementation (Cat. No.99TH8461), pp.251-260, 2002.
DOI : 10.1109/SIPS.1999.822330

B. , L. Gal, E. Casseau, and S. Khan, HLS Design Flow for Multimode IP Generation Under Multiple Constraints, Proceeding of the 14th IEEE International Conference on Electronics, Circuits and Systems (ICECS'07), 2007.
URL : https://hal.archives-ouvertes.fr/hal-00400051

M. Ghanbari, The cross search algorithm for motion estimation, Proceedings of the IEEE Transaction Communication, volume COM-38, pp.950-953, 1990.

R. Gupta and F. Brewer, High Level Synthesis: A Retrospective In High-Level Synthesis from Algorithm to Digital Circuit, pp.13-28, 2008.

S. Gupta, R. Gupta, N. Dutt, and A. Nicolau, SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits, 2004.

A. Guyot, Y. Herreros, and J. M. Muller, JANUS, an on-line multiplier/divider for manipulating large numbers, Proceedings of 9th Symposium on Computer Arithmetic, pp.106-111, 1989.
DOI : 10.1109/ARITH.1989.72815

URL : https://hal.archives-ouvertes.fr/hal-00014975

J. L. Hennessy and D. A. Patterson, Computer a quantitative approach architecture

H. C. Hunter and J. H. Moreno, A new look at exploiting data parallelism in embedded systems, Proceedings of the international conference on Compilers, architectures and synthesis for embedded systems , CASES '03, pp.159-169, 2003.
DOI : 10.1145/951710.951733

G. Jaberipur and B. Parhami, Constant-time addition with hybrid-redundant numbers: Theory and implementations, Proceedings of the Integration, the VLSI Journal, pp.49-64, 2008.
DOI : 10.1016/j.vlsi.2007.01.002

G. Jaberipur, B. Parhami, and M. Ghodsi, An Efficient Universal Addition Scheme for All Hybrid-Redundant Representations with Weighted Bit-Set Encoding, Proceedings of the The Journal of VLSI Signal Processing, pp.149-158, 2006.
DOI : 10.1007/s11265-005-4177-6

J. R. Jain and A. K. Jain, Displacement Measurement and Its Application in Interframe Image Coding, Proceedings of the IEEE Transaction on Communications, pp.29-1799, 1981.
DOI : 10.1109/TCOM.1981.1094950

M. D. Jennings and T. M. Conte, Subword extensions for video processing on mobile systems, Proceedings of the IEEE Concurrency, pp.13-16, 1998.
DOI : 10.1109/4434.708250

X. Jing and L. P. Chau, An Efficient Three-Step Search Algorithm for Block Motion Estimation, Proceedings of the IEEE Transactions on multimedia, pp.435-438, 2004.
DOI : 10.1109/TMM.2004.827517

S. Khan, E. Casseau, and D. Menard, Reconfigurable SWP Operator for Multimedia Processing, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, pp.199-202, 2009.
DOI : 10.1109/ASAP.2009.13

URL : https://hal.archives-ouvertes.fr/inria-00432572

S. Khan, E. Casseau, and D. Menard, High speed reconfigurable SWP operator for multimedia processing using redundant data representation, Proceedings of the International journal of information science and computer engineering, pp.45-52, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00480330

S. Khan, E. Casseau, and D. Menard, SWP for multimedia operator design, Proceedings of the 2nd Colloque Nationale of GDR SoC-SIP, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00432578

S. Khan, E. Casseau, and D. Menard, SWP multimedia operator design, Proceedings of the Proceeding of 5th international sciences of electronics, technologies of information and telecommunications conference (SETIT), 2009.
URL : https://hal.archives-ouvertes.fr/inria-00432578

P. Kitsos, G. Theodoridis, and O. Koufopavlou, An efficient reconfigurable multiplier architecture for Galois field GF(2m), Proceedings of the Microelectronics Journal, pp.975-980, 2003.
DOI : 10.1016/S0026-2692(03)00172-1

T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, Motion compensated interframe coding for video conferencing, Proceedings of the Nat. Telecommunication Conference, pp.5-8, 1981.

S. Krithivasan, M. J. Schulte, and J. Glossner, A subworld-parallel multiplication and sum-of-squares unit, IEEE Computer Society Annual Symposium on VLSI, pp.273-274, 2004.
DOI : 10.1109/ISVLSI.2004.1339554

M. Lanuzza, S. Perri, P. Corsonello, and M. Margala, A New Reconfigurable Coarse-Grain Architecture for Multimedia Applications, Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007), pp.119-126, 2007.
DOI : 10.1109/AHS.2007.10

R. B. Lee, Subword parallelism with MAX-2, IEEE Computer Society, pp.51-59, 1996.
DOI : 10.1109/40.526925

R. B. Lee, Multimedia extensions for general-purpose processors, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing, pp.9-23, 1997.
DOI : 10.1109/SIPS.1997.625683

R. Li, B. Zeng, and M. L. Liou, A new three step search algorithm for block motion estimation, Proceedings of the IEEE Transaction on Circuits and Systems for Video Technology, pp.438-442, 1994.

T. Li, S. Li, and C. Shen, A novel configurable motion estimation architecture for high-efficiency MPEG-4/H.264 encoding, Proceedings of the 2005 conference on Asia South Pacific design automation , ASP-DAC '05, pp.1264-1267, 2005.
DOI : 10.1145/1120725.1121039

Z. Li, S. Peng, H. Ma, and Q. Wang, A Reconfigurable DCT Architecture for Multimedia Applications, 2008 Congress on Image and Signal Processing, pp.360-364, 2008.
DOI : 10.1109/CISP.2008.773

Y. Liao and D. B. Roberts, A high-performance and low-power 32-bit multiplyaccumulate unit with single-instruction-multiple-data (SIMD) feature, Proceedings of the IEEE Journal of solid-state circuits, pp.926-931, 2002.

Y. Lin and S. Tai, Fast full-search block-matching algorithm for motion compensated video compression, Proceedings of the International Conference on Pattern Recognition (ICPR '96), pp.914-921, 1996.

L. K. Liu and E. Feig, A block based gradient descent search algorithm for block motion estimation in video coding, IEEE Transaction on Circuits and Systems for Video Technology, pp.419-422, 1996.

H. Loukil, A. B. Atitallah, F. Ghozzi, M. A. Ayed, and N. Masmoudi, A Pipelined FSBM Hardware Architecture for HTDV-H.26x, Proceedings of the International journal of electrical and electronics engineering, pp.128-135, 2008.

R. Meagher, M. Sushmitha, M. E. Rizkalla, P. Salama, and M. E. Sharkawy, VHDL Design for Real Time Motion Estimation Video Applications, Proceedings of the Journal of Signal Processing Systems, pp.339-348, 2008.
DOI : 10.1007/s11265-008-0300-9

D. Menard, E. Casseau, S. Khan, O. Sentieys, S. Chevobbe et al., Reconfigurable Operator Based Multimedia Embedded Processor, Proceedings of the International Workshop on Reconfigurable Computing: Architectures , Tools and Applications, pp.39-49, 2009.
DOI : 10.1109/79.826409

URL : https://hal.archives-ouvertes.fr/inria-00432566

D. Menard and O. Sentieys, DSP Code Generation with Optimized Data Word-Length Selection, Proceedings of 8th International Workshop on Software and Compilers for Embedded Systems (SCOPES'04), 2004.
DOI : 10.1007/978-3-540-30113-4_16

URL : https://hal.archives-ouvertes.fr/inria-00482942

D. Menard, D. Chillet, and O. Sentieys, Floating-to-Fixed-Point Conversion for Digital Signal Processors, EURASIP Journal on Applied Signal Processing, vol.37, issue.8, pp.1-19, 2006.
DOI : 10.1155/ASP/2006/96421

URL : https://hal.archives-ouvertes.fr/inria-00459212

M. Nagabushanam, C. P. Raj, and S. Ramachandran, Design and implementation of parallel and pipelined distributive arithmetic based discrete wavelet transform IP core, In Proceedings of the European Journal of Scientific Research, vol.35, pp.378-392, 2009.

J. Oliver, V. Akella, and F. Chong, Efficient orchestration of sub-word parallelism in media processors, Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures , SPAA '04, pp.225-234, 2004.
DOI : 10.1145/1007912.1007946

D. S. Phatak, T. Goff, and I. Koren, Constant-time addition and simultaneous format conversion based on redundant binary representations, Proceedings of the IEEE Transactions on computers, pp.1267-1278, 2001.
DOI : 10.1109/12.966499

L. M. Po and W. C. Ma, A novel four-step search algorithm for fast block motion estimation, IEEE Transaction on Circuits and Systems for Video Technology, pp.313-317, 1996.

A. Puri, H. M. Hang, and D. L. Schilling, An efficient block-matching algorithm for motion-compensated coding, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.1063-1066, 1987.
DOI : 10.1109/ICASSP.1987.1169777

N. Roma and L. Sousa, A New Efficient VLSI Architecture for Full Search Block Matching Motion Estimation, Proceedings of the Eleventh international conference on very large scale integration of systems on chip, pp.253-264, 2001.
DOI : 10.1007/978-0-387-35597-9_22

Y. Saito, T. Sano, M. Kato, V. Tunbunheng, Y. Yasuda et al., A Real Chip Evaluation of MuCCRA-3: A Low Power Dycamically Reconfigurable Processor Array, Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA'09), pp.283-286, 2009.

R. Sangireddy and A. K. Somani, On-Chip Adaptive Circuits for Fast Media Processing, Proceedings of the IEEE transactions on circuits and systems, pp.946-950, 2006.
DOI : 10.1109/TCSII.2006.880336

T. Sano, Y. Saito, and H. Amano, Configuration with Self-Configured Datapath: A High Speed Configuration Method for Dynamically Reconfigurable Processors

M. G. Sarwer, L. M. Po, and Q. M. Wu, Fast sum of absolute transformed difference based 4 x 4 intra-mode decision of H, AVC video coding standard

M. N. Nguyen, J. Pham, and . Lent, A low-power, high-speed implementation of a PowerPC TM microprocessor vector extension, Proceedings of 14th IEEE Symposium on Computer Arithmetic, pp.12-19, 2002.

J. P. Shen and M. H. Lipasti, Modern processor design fundamentals of superscalar processors, 2002.

Z. J. Shi, Subword Permutations with MIX Instructions, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005., pp.1637-1641, 2005.
DOI : 10.1109/ACSSC.2005.1600046

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.8241

I. Skliarova and A. B. Ferrari, Design and implementation of reconfigurable processor for problems of combinatorial computations, Proceedings of the Euromicro Symposium on Digital Systems Design, pp.112-119, 2001.

N. Takagi, H. Yasuura, and S. Yajima, High-Speed VLSI Multiplication Algorithm with a Redundant Binary Addition Tree, IEEE Transactions on Computers, pp.34-789, 1985.
DOI : 10.1109/TC.1985.1676634

M. Thornton, A signed binary addition circuit based on an alternative class of addition tables, Proceedings of the Computer and Electrical Engineering, pp.303-315, 2003.
DOI : 10.1016/S0045-7906(01)00027-1

M. Thornton, The conversion algorithm and implementation between carry-save and binary sign-digit representations, Proceedings of the Asian Journal of information technology, pp.901-906, 2005.

V. M. Tuan, N. Katsura, H. Matsutani, and H. Amano, Evaluation of a multicore reconfigurable architecture with variable core sizes, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-8, 2009.
DOI : 10.1109/IPDPS.2009.5161225

J. Vanne, E. Aho, T. D. Hämäläinen, and K. Kuusilinna, A High-Performance Sum of Absolute Difference Implementation for Motion Estimation, IEEE transaction on circuits and systems for video technology, 2006.
DOI : 10.1109/TCSVT.2006.877150

S. Vassiliadis, E. A. Hakkennes, J. S. Wong, and G. G. Pechanek, The sum-absolute-difference motion estimation accelerator, Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204), pp.559-566, 1998.
DOI : 10.1109/EURMIC.1998.708071

A. K. Verma and P. Ienne, Improved use of the carry-save representation for the synthesis of complex arithmetic circuits, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004., pp.791-798, 2004.
DOI : 10.1109/ICCAD.2004.1382683

M. Vorbach and R. Becker, Reconfigurable processor architectures for mobile phones, Proceedings International Parallel and Distributed Processing Symposium, pp.6-12, 2003.
DOI : 10.1109/IPDPS.2003.1213334

J. Wakerly, Digital Design, 2000.

A. Wang and A. Chandrakasan, A 180-mV subthreshold FFT processor using a minimum energy design methodology, Proceedings of the Solid-State Circuits, pp.310-319, 2005.
DOI : 10.1109/JSSC.2004.837945

G. Wang, The conversion algorithm and implementation between carry-save and binary sign-digit representation, Proceedings of the Asian journal of information technology, pp.901-906, 2005.

S. Wichman and N. Goel, The Second Generation ZSP DSP. LSI Logic Corporation, 2002.

S. Wong, B. Stougie, and S. Cotofana, Alternatives in FPGA-based SAD implementations, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings., pp.449-452, 2002.
DOI : 10.1109/FPT.2002.1188733

S. Wong, S. Vassiliadis, and S. Cotofana, A sum of absolute differences implementation in FPGA hardware, Proceedings. 28th Euromicro Conference, pp.183-188, 2002.
DOI : 10.1109/EURMIC.2002.1046155

B. F. Wu and T. L. Yu, Efficient hierarchical motion estimation algorithm and its VLSI architecture, Proceedings of the IEEE Transactions on very large scale integration (VLSI) systems, pp.1385-1398, 2008.

S. Xu and H. Pollitt-smith, A Multi-MicroBlaze Based SOC System: From SystemC Modeling to FPGA Prototyping, 2008 The 19th IEEE/IFIP International Symposium on Rapid System Prototyping, pp.121-127, 2008.
DOI : 10.1109/RSP.2008.15

S. M. Yen, C. S. Laih, C. H. Chen, and J. Y. Lee, An efficient redundant-binary number to binary number converter, In Proceedings of the IEEE Journal of Solid- State Circuits, vol.27, pp.109-112, 2002.

S. Yeo, T. Roh, and J. Kim, High Energy Efficient Reconfigurable Processor for Mobile Multimedia, 2008 4th IEEE International Conference on Circuits and Systems for Communications, pp.618-622, 2008.
DOI : 10.1109/ICCSC.2008.137

X. Zhang and X. Shen, A Power-Efficient Floating-Point Co-processor Design, 2008 International Conference on Computer Science and Software Engineering, pp.75-78, 2008.
DOI : 10.1109/CSSE.2008.795

S. Zhu and K. K. Ma, Correction to "A new diamond search algorithm for fast block-matching motion estimation", Proceedings of the IEEE Transaction on Image Processing, pp.287-290, 2000.
DOI : 10.1109/TIP.2000.826791