# Subhankar Pal

## contact information

Postdoctoral Researcher, Efficient and Reliable Systems

Thomas J. Watson Research Center, Yorktown Heights, NY USA

## links

**2022**

Versa: A 36-Core Systolic Multiprocessor With Dynamically Reconfigurable Interconnect and Memory

Sung Kim, Morteza Fayazi, Alhad Daftardar, Kuan-Yu Chen, Jielun Tan, Subhankar Pal, Tutu Ajayi, Yan Xiong, Trevor Mudge, Chaitali Chakrabarti, David Blaauw, Ronald Dreslinski, Hun-Seok Kim

*IEEE Journal of Solid-State Circuits**57*(*4*), 986-998, 2022
HetSched: Quality-of-Mission Aware Scheduling for Autonomous Vehicle SoCs

Aporva Amarnath, Subhankar Pal, Hiwot Kassa, Augusto Vega, Alper Buyuktosunoglu, Hubertus Franke, John-David Wellman, Ronald Dreslinski, Pradip Bose

arXiv, 2022

arXiv, 2022

OnSRAM: Efficient Inter-Node On-Chip Scratchpad Management in Deep Learning Accelerators

Subhankar Pal, Swagath Venkataramani, Viji Srinivasan, Kailash Gopalakrishnan

Abstract Just Accepted

*ACM Trans. Embed. Comput. Syst.*, Association for Computing Machinery, 2022Abstract Just Accepted

**2021**

A Holistic Solution for Reliability of 3D Parallel Systems

Javad Bagherzadeh, Aporva Amarnath, Jielun Tan, Subhankar Pal, Ronald G. Dreslinski

Abstract

*J. Emerg. Technol. Comput. Syst.**18*(*1*), Association for Computing Machinery, 2021Abstract

Towards Closing the Programmability-Efficiency Gap using Software-Defined Hardware

Subhankar Pal

2021

closing, software, computer hardware, computer science, adaptive hardware

2021

closing, software, computer hardware, computer science, adaptive hardware

SparseAdapt: Runtime Control for Sparse Linear Algebra on a Reconfigurable Accelerator

Subhankar Pal, Aporva Amarnath, Siying Feng, Michael OBoyle, Ronald Dreslinski, Christophe Dubach

Abstract control reconfiguration, cache, synchronization, oracle, linear algebra, parallel computing, multiplication, granularity, shared resource, computer science

*MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture*,*pp. 1005-1021*, ACM, 2021Abstract control reconfiguration, cache, synchronization, oracle, linear algebra, parallel computing, multiplication, granularity, shared resource, computer science

Efficient Management of Scratch-Pad Memories in Deep Learning Accelerators

Subhankar Pal, Swagath Venkataramani, Viji Srinivasan, Kailash Gopalakrishnan

Abstract

*2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)*,*pp. 240-242*, IEEEAbstract

Heterogeneity-Aware Scheduling on SoCs for Autonomous Vehicles

Aporva Amarnath, Subhankar Pal, Hiwot Tadese Kassa, Augusto Vega, Alper Buyuktosunoglu, Hubertus Franke, John-David Wellman, Ronald Dreslinski, Pradip Bose

Abstract scheduling, design space exploration, dynamic priority scheduling, distributed computing, space exploration, reduction, computer science, mission time, multiple applications, processor scheduling

*IEEE Computer Architecture Letters**20*(*2*), 82-85, Institute of Electrical and Electronics Engineers (IEEE), 2021Abstract scheduling, design space exploration, dynamic priority scheduling, distributed computing, space exploration, reduction, computer science, mission time, multiple applications, processor scheduling

CoSPARSE: A Software and Hardware Reconfigurable SpMV Framework for Graph Analytics

Siying Feng, Jiawen Sun, Subhankar Pal, Xin He, Kuba Kaszyk, Dong-hyeon Park, Magnus Morton, Trevor Mudge, Murray Cole, Michael F P OBoyle, Chaitali Chakrabarti, Ronald Dreslinski

software, computer architecture, computer science, graph analytics

*58th Design Automation Conference*, ACM Association for Computing Machinery, 2021software, computer architecture, computer science, graph analytics

Versa: A Dataflow-Centric Multiprocessor with 36 Systolic ARM Cortex-M4F Cores and a Reconfigurable Crossbar-Memory Hierarchy in 28nm

Sung Kim, Morteza Fayazi, Alhad Daftardar, Kuan-Yu Chen, Jielun Tan, Subhankar Pal, Tutu Ajayi, Yan Xiong, Trevor Mudge, Chaitali Chakrabarti, David Blaauw, Ronald Dreslinski, Hun-Seok Kim

Abstract memory hierarchy, versa, central processing unit, dataflow, mobile processor, arm architecture, synchronization, control reconfiguration, parallel computing, computer science

*2021 Symposium on VLSI Circuits*,*pp. 1-2*, IEEEAbstract memory hierarchy, versa, central processing unit, dataflow, mobile processor, arm architecture, synchronization, control reconfiguration, parallel computing, computer science

**2020**

R2D3: A Reliability Engine for 3D Parallel Systems

Javad Bagherzadeh, Aporva Amarnath, Jielun Tan, Subhankar Pal, Ronald G. Dreslinski

Abstract failure rate, reliability, performance improvement, controller, reliability engineering, baseline, computer science, degradation, reliability management

*2020 57th ACM/IEEE Design Automation Conference (DAC)*,*pp. 1-6*, IEEEAbstract failure rate, reliability, performance improvement, controller, reliability engineering, baseline, computer science, degradation, reliability management

STOMP: A Tool for Evaluation of Scheduling Policies in Heterogeneous Multi-Processors

Augusto Vega, Aporva Amarnath, John-David Wellman, Hiwot Kassa, Subhankar Pal, Hubertus Franke, Alper Buyuktosunoglu, Ronald G. Dreslinski, Pradip Bose

Abstract scheduling, distributed computing, chip, computer science, criticality, general purpose, homogeneous

*arXiv preprint arXiv:2007.14371*, 2020Abstract scheduling, distributed computing, chip, computer science, criticality, general purpose, homogeneous

HETSIM: Simulating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework

Subhankar Pal, Kuba Kaszyk, Siying Feng, Bjorn Franke, Murray Cole, Michael OBoyle, Trevor Mudge, Ronald G. Dreslinski

Abstract emulation, synchronization, trace, software, distributed computing, dependency, chip, computer science, power, scale

*2020 IEEE International Symposium on Workload Characterization (IISWC)*,*pp. 13-24*, IEEEAbstract emulation, synchronization, trace, software, distributed computing, dependency, chip, computer science, power, scale

Accelerating Deep Neural Network Computation on a Low Power Reconfigurable Architecture

Yan Xiong, Jian Zhou, Subhankar Pal, David Blaauw, Hun-Seok Kim, Trevor Mudge, Ronald Dreslinski, Chaitali Chakrabarti

Abstract cache, recurrent neural network, node, artificial neural network, bridging, control reconfiguration, computer architecture, transformer, kernel, computer science

*2020 IEEE International Symposium on Circuits and Systems (ISCAS)*,*pp. 1-5*, IEEEAbstract cache, recurrent neural network, node, artificial neural network, bridging, control reconfiguration, computer architecture, transformer, kernel, computer science

Accelerating Linear Algebra Kernels on a Massively Parallel Reconfigurable Architecture

Anuraag Soorishetty, Jian Zhou, Subhankar Pal, David Blaauw, Hun-Seok Kim, Trevor Mudge, Ronald Dreslinski, Chaitali Chakrabarti

Abstract lu decomposition, cache, qr decomposition, linear algebra, control reconfiguration, massively parallel, triangular matrix, transformer, matrix, solver, parallel computing, computer science

*ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*,*pp. 1558-1562*, IEEEAbstract lu decomposition, cache, qr decomposition, linear algebra, control reconfiguration, massively parallel, triangular matrix, transformer, matrix, solver, parallel computing, computer science

A 7.3 M Output Non-Zeros/J, 11.7 M Output Non-Zeros/GB Reconfigurable Sparse Matrix-Matrix Multiplication Accelerator

Dong-Hyeon Park, Subhankar Pal, Siying Feng, Paul Gao, Jielun Tan, Austin Rovinski, Shaolin Xie, Chun Zhao, Aporva Amarnath, Timothy Wesley, Jonathan Beaumont, Kuan-Yu Chen, Chaitali Chakrabarti, Michael Bedford Taylor, Trevor Mudge, David Blaauw, Hun-Seok Kim, Ronald G. Dreslinski

Abstract matrix multiplication, multiplication, sparse matrix, memory hierarchy, cache, graph, chip, spectral efficiency, discrete mathematics, physics

*IEEE Journal of Solid-state Circuits**55*(*4*), 933-944, Institute of Electrical and Electronics Engineers (IEEE), 2020Abstract matrix multiplication, multiplication, sparse matrix, memory hierarchy, cache, graph, chip, spectral efficiency, discrete mathematics, physics

Sparse-TPU: adapting systolic arrays for sparse matrices

Xin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong-Hyeon Park, Austin Rovinski, Haojie Ye, Yuhan Chen, Ronald Dreslinski, Trevor Mudge

Abstract sparse matrix, systolic array, matrix, multiplication, kernel, floating point, overhead, throughput, computational science, computer science

*Proceedings of the 34th ACM International Conference on Supercomputing*, ACM, 2020Abstract sparse matrix, systolic array, matrix, multiplication, kernel, floating point, overhead, throughput, computational science, computer science

Transmuter: Bridging the Efficiency Gap using Memory and Dataflow Reconfiguration

Subhankar Pal, Siying Feng, Dong-hyeon Park, Sung Kim, Aporva Amarnath, Chi-Sheng Yang, Xin He, Jonathan Beaumont, Kyle May, Yan Xiong, Kuba Kaszyk, John Magnus Morton, Jiawen Sun, Michael OBoyle, Murray Cole, Chaitali Chakrabarti, David Blaauw, Hun-Seok Kim, Trevor Mudge, Ronald Dreslinski

Abstract dataflow, hardware acceleration, dennard scaling, control reconfiguration, data structure, throughput, field programmable gate array, dynamic data, computer architecture, computer science

*Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques*,*pp. 175-190*, ACM Association for Computing Machinery, 2020Abstract dataflow, hardware acceleration, dennard scaling, control reconfiguration, data structure, throughput, field programmable gate array, dynamic data, computer architecture, computer science

**2019**

Parallelism Analysis of Prominent Desktop Applications: An 18- Year Perspective

Siying Feng, Subhankar Pal, Yichen Yang, Ronald G. Dreslinski

Abstract task parallelism, dennard scaling, multi core processor, clock rate, software, thermal design power, parallelism, computer architecture, computer science, state

*2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)*,*pp. 202-211*, IEEEAbstract task parallelism, dennard scaling, multi core processor, clock rate, software, thermal design power, parallelism, computer architecture, computer science, state

A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

Subhankar Pal, Dong-hyeon Park, Siying Feng, Paul Gao, Jielun Tan, Austin Rovinski, Shaolin Xie, Chun Zhao, Aporva Amarnath, Timothy Wesley, Jonathan Beaumont, Kuan-Yu Chen, Chaitali Chakrabarti, Michael Taylor, Trevor Mudge, David Blaauw, Hun-Seok Kim, Ronald Dreslinski

Abstract matrix multiplication, sparse matrix, multiplication, memory hierarchy, central processing unit, chip, cache, system on a chip, topology, physics

*2019 Symposium on VLSI Technology*, IEEEAbstract matrix multiplication, sparse matrix, multiplication, memory hierarchy, central processing unit, chip, cache, system on a chip, topology, physics

**2018**

OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator

Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, Ronald Dreslinski

Abstract matrix multiplication, multiplication algorithm, sparse matrix, speedup, xeon, massively parallel, high bandwidth memory, spmd, parallel computing, computer science

*2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)*,*pp. 724-736*, IEEE Computer SocietyAbstract matrix multiplication, multiplication algorithm, sparse matrix, speedup, xeon, massively parallel, high bandwidth memory, spmd, parallel computing, computer science

**2017**

A carbon nanotube transistor based RISC-V processor using pass transistor logic

Aporva Amarnath, Siying Feng, Subhankar Pal, Tutu Ajayi, Austin Rovinski, Ronald G. Dreslinski

Abstract pass transistor logic, carbon nanotube field effect transistor, logic gate, cmos, transistor, threshold voltage, critical path method, adder, electrical engineering, electronic engineering, engineering

*2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)*,*pp. 1-6*, IEEEAbstract pass transistor logic, carbon nanotube field effect transistor, logic gate, cmos, transistor, threshold voltage, critical path method, adder, electrical engineering, electronic engineering, engineering

**2014**

A New Design of an N-Bit Reversible Arithmetic Logic Unit

Subhankar Pal, Chetan Vudadha, P. Sai Phaneendra, Sreehari Veeramachaneni, Srinivas Mandalika

Abstract adder, serial binary adder, carry save adder, pass transistor logic, arithmetic logic unit, three input universal logic gate, dissipation, quantum gate, arithmetic, electronic engineering, mathematics

*2014 Fifth International Symposium on Electronic System Design*,*pp. 224-225*, IEEEAbstract adder, serial binary adder, carry save adder, pass transistor logic, arithmetic logic unit, three input universal logic gate, dissipation, quantum gate, arithmetic, electronic engineering, mathematics