这是用户在 2025-7-18 24:52 为 https://app.immersivetranslate.com/pdf-pro/uploading/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Cryogenic Design-Technology Co-Optimization: Halo-Engineered Low- V TH V TH  V_("TH ")V_{\text {TH }} Cryo-CMOS for RefreshFree All-Dynamic Computing
超低温设计-技术协同优化:Halo-工程低 V TH V TH  V_("TH ")V_{\text {TH }} 超低温 CMOS 用于刷新自由全动态计算

Yumeng Yuan 1 , 2 # 1 , 2 # ^(1,2#){ }^{1,2 \#}, Yue Xin 1 # 1 # ^(1#){ }^{1 \#}, Chang He 1 # 1 # ^(1#){ }^{1 \#}, Qing Shi 1 1 ^(1){ }^{1}, Zheng Wang 3 3 ^(3){ }^{3}, Kagan Irez 1 1 ^(1){ }^{1}, Linfeng Xie 1 1 ^(1){ }^{1}, Sijie Yang 1 1 ^(1){ }^{1}, Yuanyuan Han 3 3 ^(3)^{3}, Zewei Wang 1 1 ^(1){ }^{1}, Xiaoxu Kang 4 4 ^(4){ }^{4}, Shaojian Hu 4 Hu 4 Hu^(4)\mathrm{Hu}^{4}, and Xufeng Kou 1 1 ^(1**){ }^{1 *}
袁昱萌 1 , 2 # 1 , 2 # ^(1,2#){ }^{1,2 \#} ,辛越 1 # 1 # ^(1#){ }^{1 \#} ,何畅 1 # 1 # ^(1#){ }^{1 \#} ,施青 1 1 ^(1){ }^{1} ,王正 3 3 ^(3){ }^{3} ,伊兹根·伊雷兹 1 1 ^(1){ }^{1} ,谢林峰 1 1 ^(1){ }^{1} ,杨思杰 1 1 ^(1){ }^{1} ,韩远元 3 3 ^(3)^{3} ,王泽伟 1 1 ^(1){ }^{1} ,康晓旭 4 4 ^(4){ }^{4} ,邵健 Hu 4 Hu 4 Hu^(4)\mathrm{Hu}^{4} ,寇旭峰 1 1 ^(1**){ }^{1 *}
1 1 ^(1){ }^{1} ShanghaiTech University, Shanghai, China, 2 2 ^(2){ }^{2} Shanghai Institute of Microsystem and Information Technology, Shanghai, China,
1 1 ^(1){ }^{1} 上海科技大学,上海,中国, 2 2 ^(2){ }^{2} 中国科学院微系统与信息技术研究所,上海,中国,
3 3 ^(3){ }^{3} Fudan University, Shanghai, China, 4 4 ^(4){ }^{4} Shanghai IC Research and Development Center (ICRD), Shanghai, China,
3 3 ^(3){ }^{3} 复旦大学,上海,中国, 4 4 ^(4){ }^{4} 上海集成电路研发中心(ICRD),上海,中国,
*Email: kouxf@shanghaitech.edu.cn. *Authors contributed equally to this work.
*Email: kouxf@shanghaitech.edu.cn. *两位作者对本文贡献相当。

Abstract  摘要

This work demonstrates a holistic DesignTechnology Co-Optimization (DTCO) strategy for energyefficient cryogenic computing. By applying the halo doping modulation method, we develop a low-power Cryo-CMOS technology featured by V TH < 0.25 V V TH < 0.25 V V_(TH) < 0.25VV_{\mathrm{TH}}<0.25 \mathrm{~V} and σ ( V TH ) < 30 mV σ V TH < 30 mV sigma(V_(TH)) < 30mV\sigma\left(V_{\mathrm{TH}}\right)<30 \mathrm{mV} at low temperatures (LT). Meanwhile, the adoption of pull-downonly cryogenic dynamic logic gates manages to realize a 65 % 65 % 65%65 \% speed gain compared to the room-temperature static counterparts, and the low-leakage nature of Cryo-CMOS helps improve the retention time of the dynamic storage unit by 10 4 10 4 10^(4)10^{4}. On this basis, we design a refresh-free all-dynamic 3 × 3 3 × 3 3xx33 \times 3 systolic array which not only saves the layout area by 25 % 25 % 25%25 \%, but also achieves a 58 % 58 % 58%58 \% power-delay-product (PDP) reduction at 77 K .
本文展示了适用于低温计算的高能效设计技术协同优化(DTCO)策略。通过应用边缘掺杂调制方法,我们开发了一种低温(LT)环境下具有 V TH < 0.25 V V TH < 0.25 V V_(TH) < 0.25VV_{\mathrm{TH}}<0.25 \mathrm{~V} σ ( V TH ) < 30 mV σ V TH < 30 mV sigma(V_(TH)) < 30mV\sigma\left(V_{\mathrm{TH}}\right)<30 \mathrm{mV} 特征的低功耗 Cryo-CMOS 技术。同时,采用仅下拉的低温动态逻辑门实现了与室温静态门相比 65 % 65 % 65%65 \% 倍的速度提升,而 Cryo-CMOS 的低泄漏特性有助于将动态存储单元的保持时间提高 10 4 10 4 10^(4)10^{4} 。在此基础上,我们设计了一种无需刷新的全动态 3 × 3 3 × 3 3xx33 \times 3 积木阵列,不仅节省了布局面积 25 % 25 % 25%25 \% ,还在 77 K 下实现了 58 % 58 % 58%58 \% 倍的功耗延迟积(PDP)降低。

I. Introduction  I. 引言

Cryogenic electronics, benefiting from both improved switching performance and enhanced driving strength, provides new opportunities to boost the speed and energy efficiency of the data processing. Unfortunately, suffering from the carrier freeze-out effect, CMOS-transistor experiences a substantial threshold voltage ( V TH V TH V_(TH)V_{\mathrm{TH}} ) drift when operated at LT, and the partially-activated dopants and interfacial traps invariably exacerbate the device-to-device variation; both effects restrict supply voltage scaling and severely limit the cryogenic circuit design space [1-2]. To address these challenges, channel-doping-profile engineering can be used as an effective tuning knob to improve the performance-power trade-offs of MOSFETs in the LT region [3-4]. Apart from the V TH V TH V_(TH)V_{\mathrm{TH}}-optimization approach, the inherent low-leakage characteristics of Cryo-CMOS can significantly enhance the robustness of data stored in digital logic blocks, therefore offering another degree of freedom to construct novel refresh-free cryogenic circuits, expanding the design potential for high-performance computing [5-8].
低温电子技术得益于开关性能的提升和驱动强度的增强,为提高数据处理的速度和能效提供了新的机会。然而,由于载流子冻结效应,CMOS 晶体管在低温下运行时会经历显著的阈值电压漂移( V TH V TH V_(TH)V_{\mathrm{TH}} ),部分激活的掺杂剂和界面陷阱不可避免地加剧了器件间的差异;这两种效应限制了供电电压的缩放,并严重限制了低温电路的设计空间[1-2]。为应对这些挑战,可以通过沟道掺杂分布工程作为有效的调节手段,改善低温区域 MOSFET 的性能-功耗折衷[3-4]。除了 V TH V TH V_(TH)V_{\mathrm{TH}} 优化方法外,低温 CMOS 固有的低泄漏特性可以显著增强数字逻辑块中存储数据的鲁棒性,从而为构建新的无需刷新的低温电路提供了另一种自由度,扩展了高性能计算的设计潜力[5-8]。
Inspired by these insights, this work reports a DTCO scheme (Fig. 1) focused on developing a low-power CryoCMOS process as well as incorporating device-level merits into cryogenic circuit design. By tailoring the halo implants, the optimized device exhibits a sub- 0.25 V threshold voltage while retaining a high on/off ratio exceeding 10 8 10 8 10^(8)10^{8} at 77 K , alongside a 41 % 41 % 41%41 \% reduction in device-to-device variation. Furthermore, we demonstrate that the dynamic logic architectures-including NAND/NOR, true single-phase clock (TSPC), and embedded DRAM (eDRAM) modules-enable
受这些见解的启发,本工作提出了一种 DTCO 方案(图 1),旨在开发一种低功耗的 CryoCMOS 工艺,并将器件级优势融入到低温电路设计中。通过调整边缘植入,优化后的器件在 77 K 时表现出低于 0.25 V 的阈值电压,同时保持了超过 10 8 10 8 10^(8)10^{8} 的高开/关比,并且器件间差异减少了 41 % 41 % 41%41 \% 。此外,我们展示了动态逻辑架构,包括 NAND/NOR、真单相时钟(TSPC)和嵌入式 DRAM(eDRAM)模块,能够在低温域实现紧凑的布局、更快的速度、更低的切换功耗以及可靠的电压缩放。

compact layouts, faster speed, lower switching power, and reliable voltage scaling in the cryogenic temperature domain. Integrated with the cryo-dynamic processing elements (PEs), the resulting cryogenic all-dynamic systolic array achieves a 39.8 % 39.8 % 39.8%39.8 \% speed gain and a 29.9 % 29.9 % 29.9%29.9 \% power reduction compared to the static baselines.
结合低温动态处理单元(PEs),所得到的低温全动态 systolic 阵列相比静态基线实现了 39.8 % 39.8 % 39.8%39.8 \% 的速度提升和 29.9 % 29.9 % 29.9%29.9 \% 的功耗降低。

II. Exploring the Design Space of Cryo-CMOS
II. 探索 Cryo-CMOS 的设计空间

Figs 2-4 characterize the temperature dependence of key electrical parameters measured from two standard 28nm-node NMOS and PMOS transistors. As the devices are cooled down from 298 K to 10 K , the on-state current ( I DSAT I DSAT I_(DSAT)I_{\mathrm{DSAT}} ) increases by 40 % 83 % 40 % 83 % 40%∼83%40 \% \sim 83 \%, primarily driven by the boosted carrier mobility (Fig. 2a). Concurrently, suppressed thermal voltage ( U T = k T / q U T = k T / q U_(T)=kT//qU_{T}=k T / q ) at LT reduces the sub-threshold swing ( S S ) ( S S ) (SS)(S S) from 90 mV / dec 90 mV / dec 90mV//dec90 \mathrm{mV} / \mathrm{dec} ( 298 K ) to 23 mV / dec ( 10 K ) 23 mV / dec ( 10 K ) 23mV//dec(10K)23 \mathrm{mV} / \operatorname{dec}(10 \mathrm{~K}), and the zero-bias leakage current ( I OFF I OFF  I_("OFF ")I_{\text {OFF }} ) follows an exponential scaling law I off ( T ) = I off , 0 10 T η I off  ( T ) = I off  , 0 10 T η I_("off ")(T)=I_("off ",0)*10^(T*eta)I_{\text {off }}(T)=I_{\text {off }, 0} \cdot 10^{T \cdot \eta}, resulting in an elevation of the device on/off ratio up to 10 10 10 10 10^(10)10^{10} (Figs. 2b-d). On the other hand, however, the V TH V TH V_(TH)V_{\mathrm{TH}} values extracted by constant-current method ( I DS ( L / W ) = 10 8 A I DS ( L / W ) = 10 8 A I_(DS)*(L//W)=10^(-8)AI_{\mathrm{DS}} \cdot(L / W)=10^{-8} \mathrm{~A} ) are found to increase by 0.2 V 0.3 V 0.2 V 0.3 V 0.2V∼0.3V0.2 \mathrm{~V} \sim 0.3 \mathrm{~V}, indicating a 40 % 40 % 40%40 \% reduction in the available gate overdrive voltage ( V OV V OV V_(OV)V_{\mathrm{OV}} ) range (Fig. 3). Moreover, quantitative Monte-Carlo statistical analysis of 150+ devices at TT/FS/SF/FF/SS corners reveals that the global device variability σ ( Δ V TH ) σ Δ V TH sigma(DeltaV_(TH))\sigma\left(\Delta V_{\mathrm{TH}}\right) of NMOS (PMOS) increases from 33.9 mV ( 40.5 mV ) 33.9 mV ( 40.5 mV ) 33.9mV(40.5mV)33.9 \mathrm{mV}(40.5 \mathrm{mV}) to 43.6 mV ( 56.3 mV ) 43.6 mV ( 56.3 mV ) 43.6mV(56.3mV)43.6 \mathrm{mV}(56.3 \mathrm{mV}), mainly owing to exacerbated doping fluctuations at low temperatures (Fig. 4). Besides, the supplementary Pelgrom’s law plot unveils a 62 % 62 % 62%62 \% increase in local mismatch at 77 K , which poses great challenges to cryogenic circuit stability and timing margins [9].
图 2-4 描述了两个标准 28nm 节点 NMOS 和 PMOS 晶体管在从 298 K 冷却至 10 K 过程中关键电气参数的温度依赖性。随着器件冷却,导通态电流( I DSAT I DSAT I_(DSAT)I_{\mathrm{DSAT}} )增加了 40 % 83 % 40 % 83 % 40%∼83%40 \% \sim 83 \% ,主要由载流子迁移率提升驱动(图 2a)。同时,低温下的抑制热电压( U T = k T / q U T = k T / q U_(T)=kT//qU_{T}=k T / q )降低了亚阈值摆幅( ( S S ) ( S S ) (SS)(S S) )从 90 mV / dec 90 mV / dec 90mV//dec90 \mathrm{mV} / \mathrm{dec} (298 K)到 23 mV / dec ( 10 K ) 23 mV / dec ( 10 K ) 23mV//dec(10K)23 \mathrm{mV} / \operatorname{dec}(10 \mathrm{~K}) ,零偏漏电流( I OFF I OFF  I_("OFF ")I_{\text {OFF }} )遵循指数缩放定律 I off ( T ) = I off , 0 10 T η I off  ( T ) = I off  , 0 10 T η I_("off ")(T)=I_("off ",0)*10^(T*eta)I_{\text {off }}(T)=I_{\text {off }, 0} \cdot 10^{T \cdot \eta} ,导致器件的导通/截止比提升至 10 10 10 10 10^(10)10^{10} (图 2b-d)。另一方面,通过恒流法提取的 V TH V TH V_(TH)V_{\mathrm{TH}} 值增加了 0.2 V 0.3 V 0.2 V 0.3 V 0.2V∼0.3V0.2 \mathrm{~V} \sim 0.3 \mathrm{~V} ,表明可用栅极驱动电压( V OV V OV V_(OV)V_{\mathrm{OV}} )范围减少了 40 % 40 % 40%40 \% (图 3)。此外,对 TT/FS/SF/FF/SS 角的 150 多个器件进行定量蒙特卡洛统计分析显示,NMOS(PMOS)器件的全局变异系数 σ ( Δ V TH ) σ Δ V TH sigma(DeltaV_(TH))\sigma\left(\Delta V_{\mathrm{TH}}\right) 33.9 mV ( 40.5 mV ) 33.9 mV ( 40.5 mV ) 33.9mV(40.5mV)33.9 \mathrm{mV}(40.5 \mathrm{mV}) 增加到 43.6 mV ( 56.3 mV ) 43.6 mV ( 56.3 mV ) 43.6mV(56.3mV)43.6 \mathrm{mV}(56.3 \mathrm{mV}) ,主要是由于低温下掺杂波动加剧(图 4)。 此外,补充的 Pelgrom 定律图揭示了在 77 K 时局部失配增加了约 17%,这对低温电路的稳定性和时间裕度提出了巨大挑战[9]。
Theoretically, the surface potential of a typical NMOS device is given by φ s = U T ln ( N A ( T ) / n i ( T ) ) φ s = U T ln N A ( T ) / n i ( T ) varphi_(s)=U_(T)ln(N_(A)(T)//n_(i)(T))\varphi_{\mathrm{s}}=U_{T} \ln \left(N_{A}(T) / n_{i}(T)\right), where N A N A N_(A)N_{\mathrm{A}} is the effective channel doping, and the intrinsic carrier density n i ( T ) exp ( E g / 2 k T ) n i ( T ) exp E g / 2 k T n_(i)(T)prop exp(-E_(g)//2kT)n_{i}(T) \propto \exp \left(-E_{g} / 2 k T\right) decreases dramatically at LT, hence leading to a V TH V TH  V_("TH ")V_{\text {TH }} increase [10-11]. Under such circumstances, our TCAD simulation results of Fig. 5 suggest that reducing the halo dose by 50 % 50 % 50%50 \% is able to mitigate the 0.2 V V TH 0.2 V V TH  0.2V-V_("TH ")-0.2 \mathrm{~V}-V_{\text {TH }}- shift. Meanwhile, fewer halo implants also help reduce the impurity scattering, which in turn gives rise to pronounced enhancements of the channel mobility and I DSAT I DSAT  I_("DSAT ")I_{\text {DSAT }} by 80 % 80 % 80%80 \% in the LT region (Fig. 6). More importantly, such a low-halo-doping strategy can effectively weaken the peak electric field at the channel edges, therefore reducing the sensitivity of the channel
理论上,典型 NMOS 器件的表面势由 φ s = U T ln ( N A ( T ) / n i ( T ) ) φ s = U T ln N A ( T ) / n i ( T ) varphi_(s)=U_(T)ln(N_(A)(T)//n_(i)(T))\varphi_{\mathrm{s}}=U_{T} \ln \left(N_{A}(T) / n_{i}(T)\right) 给出,其中 N A N A N_(A)N_{\mathrm{A}} 是有效沟道掺杂,而本征载流子密度 n i ( T ) exp ( E g / 2 k T ) n i ( T ) exp E g / 2 k T n_(i)(T)prop exp(-E_(g)//2kT)n_{i}(T) \propto \exp \left(-E_{g} / 2 k T\right) 在低温下急剧下降,因此导致表面势增加约 V TH V TH  V_("TH ")V_{\text {TH }} [10-11]。在这种情况下,图 5 中的 TCAD 仿真结果表明,通过减少环绕剂量至 50 % 50 % 50%50 \% ,可以缓解 0.2 V V TH 0.2 V V TH  0.2V-V_("TH ")-0.2 \mathrm{~V}-V_{\text {TH }}- 的偏移。同时,较少的环绕植入也有助于减少杂质散射,从而在低温区域显著提高沟道迁移率和 I DSAT I DSAT  I_("DSAT ")I_{\text {DSAT }} ,增加约 80 % 80 % 80%80 \% (图 6)。更重要的是,这种低环绕掺杂策略可以有效减弱沟道边缘的峰值电场,从而降低沟道的敏感性。

potential to random dopant fluctuations [12]. As confirmed by the TCAD data in Fig. 7, the low-temperature thresholdvoltage distribution curve indeed becomes tightened under the low-halo-dose condition, with σ ( V TH ) σ V TH sigma(V_(TH))\sigma\left(V_{\mathrm{TH}}\right) decreasing from 37 mV to 22 mV (Fig. 7). In addition, it is noted that even though the low- V TH V TH V_(TH)V_{\mathrm{TH}} device exhibits a compromised on/off ratio at room temperature, yet this drawback can be alleviated at LT thanks to the steep sub-threshold swing (Fig. 8).
潜在的随机掺杂波动 [12]。如图 7 中的 TCAD 数据所示,低温条件下的阈值电压分布曲线确实变得更加紧凑, σ ( V TH ) σ V TH sigma(V_(TH))\sigma\left(V_{\mathrm{TH}}\right) 从 37 mV 降至 22 mV(图 7)。此外,值得注意的是,尽管低温下的 V TH V TH V_(TH)V_{\mathrm{TH}} 设备在室温下的开启/关闭比有所下降,但在低温下,由于陡峭的亚阈值摆幅,这一缺点可以得到缓解(图 8)。
Accordingly, by taking the above low-temperature device physic into consideration, we have developed a channel doping profile optimized for cryogenic operation. As visualized in Fig. 9 , by tuning the halo implant dose within the 10 15 10 15 10^(15)10^{15} atom / cm 3 / cm 3 //cm^(3)/ \mathrm{cm}^{3} range, the targeted cryogenic transistor matches the roomtemperature V TH V TH  V_("TH ")V_{\text {TH }} value of the 28 nm -HKMG LVT device with an identical gate geometry, while simultaneously improving both the zero-bias leakage current and V TH V TH V_(TH)V_{\mathrm{TH}} variation. Consistent with our TCAD simulation results, the fabricated device demonstrates near-ideal cryogenic performance at 77 K , characterized by V TH < 0.25 V , I OFF < 4 pA / μ m , I DSAT > 1.4 V TH < 0.25 V , I OFF < 4 pA / μ m , I DSAT > 1.4 V_(TH) < 0.25V,I_(OFF) < 4pA//mum,I_(DSAT) > 1.4V_{\mathrm{TH}}<0.25 \mathrm{~V}, I_{\mathrm{OFF}}<4 \mathrm{pA} / \mu \mathrm{m}, I_{\mathrm{DSAT}}>1.4 mA / μ m mA / μ m mA//mum\mathrm{mA} / \mu \mathrm{m} and S S < 35 mV / dec S S < 35 mV / dec SS < 35mV//decS S<35 \mathrm{mV} / \mathrm{dec}, thus accomplishing the design goal of the Cryo-CMOS technology.
因此,考虑到上述低温器件物理特性,我们开发了一种优化的沟道掺杂分布,适用于低温操作。如图 9 所示,通过调整卤素植入剂量在 10 15 10 15 10^(15)10^{15} 原子 / cm 3 / cm 3 //cm^(3)/ \mathrm{cm}^{3} 范围内,目标低温晶体管在相同的栅极几何形状下匹配 28 nm -HKMG LVT 设备的室温 V TH V TH  V_("TH ")V_{\text {TH }} 值,同时提高了零偏漏电流和 V TH V TH V_(TH)V_{\mathrm{TH}} 的变化。与我们的 TCAD 仿真结果一致,制备的器件在 77 K 下表现出接近理想的低温性能,特征为 V TH < 0.25 V , I OFF < 4 pA / μ m , I DSAT > 1.4 V TH < 0.25 V , I OFF < 4 pA / μ m , I DSAT > 1.4 V_(TH) < 0.25V,I_(OFF) < 4pA//mum,I_(DSAT) > 1.4V_{\mathrm{TH}}<0.25 \mathrm{~V}, I_{\mathrm{OFF}}<4 \mathrm{pA} / \mu \mathrm{m}, I_{\mathrm{DSAT}}>1.4 mA / μ m mA / μ m mA//mum\mathrm{mA} / \mu \mathrm{m} S S < 35 mV / dec S S < 35 mV / dec SS < 35mV//decS S<35 \mathrm{mV} / \mathrm{dec} ,从而实现了 Cryo-CMOS 技术的设计目标。

III. Cryogenic Refresh-Free All-Dynamic Logic
III. 低温无刷新全动态逻辑

Static CMOS logic gates employ complementary pull-up (PUN) and pull-down (PDN) networks to ensure a fullyrestored output state for every input vector, yet this dualnetwork structure doubles the transistor count (Fig. 11a). In contrast, dynamic logic gates implement the same logic function with only one network (either PUN or PDN), yielding a ( N + 2 N + 2 N+2N+2 )-topology to reduce node capacitance and to accelerate the switching speed. However, the non-negligible subthreshold leakage mandates frequent refresh cycles during dynamic logic operations at room temperature, which inevitably increases both power consumption and circuit complexity. To resolve this limitation, here we introduce the refresh-free cryogenic dynamic logic, which utilizes the low-leakage feature of CryoCMOS to extend the retention time, allowing the gate to operate in a quasi-static mode at LT without refresh overhead (Fig. 11b). By co-optimizing the halo-engineered low- V TH V TH V_(TH)V_{\mathrm{TH}} Cryo-CMOS devices with this cryo-dynamic architecture, we can subsequently explore cryogenic logic and memory modules suitable for energy-efficient computing systems.
静态 CMOS 逻辑门通过互补的拉高(PUN)和拉低(PDN)网络确保每个输入向量的输出状态完全恢复,但这种双网络结构会将晶体管数量翻倍(图 11a)。相比之下,动态逻辑门仅使用一个网络(PUN 或 PDN)来实现相同的逻辑功能,形成一个( N + 2 N + 2 N+2N+2 )拓扑,以减少节点电容并加速切换速度。然而,不可忽视的亚阈值泄漏在室温下的动态逻辑操作中需要频繁刷新周期,这不可避免地增加了功耗和电路复杂度。为了解决这一限制,我们引入了无刷新低温动态逻辑,利用 CryoCMOS 的低泄漏特性延长保持时间,使门在低温(LT)下能够以准静态模式运行,无需刷新开销(图 11b)。通过优化环形工程低 V TH V TH V_(TH)V_{\mathrm{TH}} Cryo-CMOS 器件与这种低温动态架构,我们可以进一步探索适用于能效计算系统的低温逻辑和存储模块。
Specifically, given the superior driving strength of NMOS device at LT (Fig. 2a), the cryo-dynamic logic gates all adopt the PDN-only configuration. To quantify the benefits of dynamic circuit at low temperatures, we have first designed the multi-duty-cycle clock circuit consisting of both dynamic and static-version NAND/NOR gates using the optimized 28 nm node Cryo-CMOS technology (Fig. 12). Measurement results (Fig. 13) validate that the cryo-dynamic NAND (NOR) gate gates achieve 24 % 24 % 24%24 \% ( 35 % 35 % 35%35 \% ) lower delay than static counterparts at 77 K , alongside a 65 % 65 % 65%65 \% ( 66 % 66 % 66%66 \% ) improvement over roomtemperature static baselines. Strikingly, the dual power-saving mechanisms of Cryo-CMOS, namely diminished zero-bias leakage (reducing idle power) and steep subthreshold swing (minimizing switching energy), collectively yield an overall power reduction of 57.1 % 57.1 % 57.1%57.1 \% of the taped-out chip at 77 K (Fig. 14). Moreover, Fig. 15 shows the cryo-dynamic NAND gate
具体而言,由于低温(LT)下 NMOS 器件的驱动能力强(图 2a),所有 cryo-动态逻辑门均采用 PDN-only 配置。为了量化低温下动态电路的优势,我们首先使用优化的 28 nm 节点 Cryo-CMOS 技术设计了一个包含动态和静态版本 NAND/NOR 门的多时钟周期电路(图 12)。测量结果(图 13)表明,在 77 K 下,cryo-动态 NAND(NOR)门的延迟比静态版本分别低 24 % 24 % 24%24 \% 35 % 35 % 35%35 \% ),同时比室温下的静态基准提高了 65 % 65 % 65%65 \% 66 % 66 % 66%66 \% )。令人惊讶的是,Cryo-CMOS 的双重节能机制,即减小零偏漏电流(降低空闲功耗)和陡峭的亚阈值摆幅(最小化切换能量),共同导致在 77 K 下已流片的芯片整体功耗降低了 57.1 % 57.1 % 57.1%57.1 \% (图 14)。此外,图 15 展示了 cryo-动态 NAND 门

achieves the same speed at V DD = 0.8 V V DD = 0.8 V V_(DD)=0.8VV_{\mathrm{DD}}=0.8 \mathrm{~V} while consuming only 0.57 × 0.57 × 0.57 xx0.57 \times dynamic power compared to the 0.9 V / 298 K 0.9 V / 298 K 0.9V//298K0.9 \mathrm{~V} / 298 \mathrm{~K} static scenario, underscoring the combined voltage-scaling and efficiency benefits. Apart from logic gates, we have also adopted the same cryo-dynamic concept to design storage units. For instance, the enhanced I D S A T I D S A T I_(DSAT)I_{D S A T} enables us to scale-down the output PMOS width from 800 nm to 200 nm so that the overall layout area of the dynamic TSPC register is reduced by 25.3 % 25.3 % 25.3%25.3 \%; meanwhile, the discovered exponential I OFF ( T ) I OFF ( T ) I_(OFF)(T)I_{\mathrm{OFF}}(T) relationship (Fig.2b) greatly boosts the retention time by 4.2 × 10 4 4.2 × 10 4 4.2 xx10^(4)4.2 \times 10^{4} (from 37.8 ns @ 298 K 37.8 ns @ 298 K 37.8ns@298K37.8 \mathrm{~ns} @ 298 \mathrm{~K} to 1.59 ms @ 77 K 1.59 ms @ 77 K 1.59ms@77K1.59 \mathrm{~ms} @ 77 \mathrm{~K} ). As a result, when operated at matched speeds, the optimized design lowers the power by 40 % 40 % 40%40 \% compared to the static D-Flipflop gate, demonstrating better frequency-power tradeoffs at 77 K (Fig. 17). In view of cache design, by removing the pull-up PMOS transistors from the 6T-SRAM structure, we develop a compact cryogenic 4TeDRAM, which saves 16.3 % 16.3 % 16.3%16.3 \% bitcell area while maintaining comparable 25 ps read latency at 77 K (Fig. 18a). Likewise, experimental validation on a fabricated 4 Kb eDRAM chip (Fig. 18b) confirms a prolonged retention time up to 3.9 ms at V DD V DD V_(DD)V_{\mathrm{DD}} = 0.7 V = 0.7 V =0.7V=0.7 \mathrm{~V} (Fig. 18c), thereafter manifesting the quasi-static storage mode at LT. In addition, Fig. 19 depicts the proposed PE architecture where pull-up and pull-down-configured cryodynamic full adders (FA) units are interleaved in a staggered placement to optimize routing and signal integrity. Crucially, the 135 μ s 135 μ s 135 mus135 \mu \mathrm{~s} low-temperature PE retention time significantly exceeds the critical-path propagation delay ( 3 ns ) by four orders of magnitude, which in turn warrants a refresh-free PE operation. Consequently, comparative analysis of Fig. 20 reveals the cryo-dynamic PE outperforms its static counterpart, including a 39.6% power reduction, a 49.8% speed gain, a 69.7% lower PDP and a 21.4 % 21.4 % 21.4%21.4 \% area reduction.
在 0#工作电压下,与 0.9 V / 298 K 0.9 V / 298 K 0.9V//298K0.9 \mathrm{~V} / 298 \mathrm{~K} 静态场景相比,仅消耗 0.57 × 0.57 × 0.57 xx0.57 \times 动态功耗就实现了相同的速度,突显了电压缩放和效率的综合优势。除了逻辑门之外,我们还采用了相同的低温动态概念来设计存储单元。例如,增强的 I D S A T I D S A T I_(DSAT)I_{D S A T} 使我们能够将输出 PMOS 的宽度从 800 nm 缩小到 200 nm,从而使动态 TSPC 寄存器的整体布局面积减少了 25.3 % 25.3 % 25.3%25.3 \% ;同时,发现的指数 I OFF ( T ) I OFF ( T ) I_(OFF)(T)I_{\mathrm{OFF}}(T) 关系(图 2b)极大地提高了保持时间 4.2 × 10 4 4.2 × 10 4 4.2 xx10^(4)4.2 \times 10^{4} (从 37.8 ns @ 298 K 37.8 ns @ 298 K 37.8ns@298K37.8 \mathrm{~ns} @ 298 \mathrm{~K} 增加到 1.59 ms @ 77 K 1.59 ms @ 77 K 1.59ms@77K1.59 \mathrm{~ms} @ 77 \mathrm{~K} )。因此,在匹配速度下运行时,优化设计的功耗比静态 D-Flipflop 门降低 40 % 40 % 40%40 \% ,在 77 K 时展示了更好的频率-功耗折衷(图 17)。在缓存设计方面,通过从 6T-SRAM 结构中移除拉高 PMOS 晶体管,我们开发了一种紧凑的低温 4TeDRAM,其位单元面积节省了 16.3 % 16.3 % 16.3%16.3 \% ,同时在 77 K 时保持了相近的 25 ps 读取延迟(图 18a)。同样,对制造的 4 Kb eDRAM 芯片的实验验证(图 18b)也证实了在 V DD V DD V_(DD)V_{\mathrm{DD}} = 0.7 V = 0.7 V =0.7V=0.7 \mathrm{~V} 时保持时间延长至 3.9 ms。 18c),随后在低温下表现出准静态存储模式。此外,图 19 展示了所提出的 PE 架构,其中拉高配置和拉低配置的 cryodynamic 全加器(FA)单元交错排列,以优化布线和信号完整性。关键的是, 135 μ s 135 μ s 135 mus135 \mu \mathrm{~s} 低温下 PE 的保持时间远超过关键路径传播延迟(3 ns)四个数量级,从而确保了无需刷新的 PE 操作。因此,图 20 的比较分析显示,cryo-dynamic PE 在性能上优于其静态对应物,包括 39.6%的功耗降低、49.8%的速度提升、69.7%更低的 PDP 和 21.4 % 21.4 % 21.4%21.4 \% 面积减少。

IV. Conclusion  IV. 结论

Finally, by capitalizing on the device- and circuit-level advantages of Cryo-CMOS, we design a cryogenic refresh-free all-dynamic 3 × 3 3 × 3 3xx33 \times 3 systolic array, which integrates the optimized cryo-dynamic PEs, TSPC registers, and eDRAM modules (Fig. 21). Benchmarked against a room-temperature static logicbased implementation, the quantitative post-layout simulation results highlight that our all-dynamic cryogenic systolic array achieves a 25 % 25 % 25%25 \% area reduction while delivering a consistently lower PDP ( 0.42 × 0.51 × 0.42 × 0.51 × 0.42 xx∼0.51 xx0.42 \times \sim 0.51 \times ) across the entire supply voltage range (Figs. 22-23). The cryogenic DTCO framework elaborated in this work unveils a synergistic device-circuitsystem roadmap for scalable cryogenic computing platforms.
最后,通过利用 Cryo-CMOS 在器件和电路层面的优势,我们设计了一种基于 Cryo-CMOS 的无刷新全动态 3 × 3 3 × 3 3xx33 \times 3 阵列,该阵列集成了优化的 Cryo-Dynamic PEs、TSPC 寄存器和 eDRAM 模块(图 21)。与室温静态逻辑实现相比,后布局仿真结果表明,我们的全动态低温阵列在面积上实现了 25 % 25 % 25%25 \% 的减少,并在整个供电电压范围内持续提供更低的 PDP( 0.42 × 0.51 × 0.42 × 0.51 × 0.42 xx∼0.51 xx0.42 \times \sim 0.51 \times )(图 22-23)。本工作中阐述的低温 DTCO 框架揭示了一种协同的器件-电路系统路线图,适用于可扩展的低温计算平台。

AcKnowledgment  致谢

This work is supported by the National Key R&D Program of China (2023YFB4404000), Shanghai Oriental Talent Program, and Zhangjiang Lab Strategic Program.
本工作得到了国家重点研发计划(2023YFB4404000)、上海市东方人才计划和张江实验室战略计划的支持。

REFERENCES  参考文献

[1] Cezar Zota et al., Nat Electron (2024). [2] J. Pineda et al., JSSC (2004). [3] K. Yılmaz et al., TED (2022). [4] Cao W. et al., Nature (2023). [5] Y. Shu et al., J S S C J S S C JSSCJ S S C (2024). [6] R. A. Damsteegt et al., J S S C J S S C JSSCJ S S C (2024). [7] D. Prasad et al., IEDM (2022). [8] H. -L. Chiang et al., IEDM (2021). [9] G. Kiene et al., JSSC (2023). [10] C. Enz et al., IEDM (2020). [11] H. Su et al., ESSERC (2024). [12] J. A. Croon et al., ESSDERC ESSDERC ESSDERC\operatorname{ESSDERC} (2002).
[1] Cezar Zota 等,Nat Electron (2024)。[2] J. Pineda 等,JSSC (2004)。[3] K. Yılmaz 等,TED (2022)。[4] Cao W.等,Nature (2023)。[5] Y. Shu 等, J S S C J S S C JSSCJ S S C (2024)。[6] R. A. Damsteegt 等, J S S C J S S C JSSCJ S S C (2024)。[7] D. Prasad 等,IEDM (2022)。[8] H. -L. Chiang 等,IEDM (2021)。[9] G. Kiene 等,JSSC (2023)。[10] C. Enz 等,IEDM (2020)。[11] H. Su 等,ESSERC (2024)。[12] J. A. Croon 等, ESSDERC ESSDERC ESSDERC\operatorname{ESSDERC} (2002)。

Fig. 1. Overview of the proposed cryogenic CMOS DTCO framework. Device level: halo doping engineering is used to modulate the V TH V TH V_(TH)V_{\mathrm{TH}} shift, reduce the device variation, while retaining a high on/off ratio at 77 K . Circuit level: cryogenic dynamic architectures with fast speed, reduced transistor count, and low dynamic power. System level: low-leakage and enhanced driving capability enable the implementation of refresh-free all-dynamic cryogenic system.
Fig. 1. 提出的低温 CMOS DTCO 框架概述。器件级别:通过边缘掺杂工程调节 V TH V TH V_(TH)V_{\mathrm{TH}} 偏移,减少器件变异性,同时在 77 K 时保持高开/关比。电路级别:低温动态架构,具有高速度、减少晶体管数量和低动态功耗。系统级别:低泄漏和增强的驱动能力使无刷新全动态低温系统得以实现。

Fig. 2. Temperature dependence of key device parameters: (a) I DSAT I DSAT I_(DSAT)I_{\mathrm{DSAT}}, (b) S S S S SSS S, © I off I off  I_("off ")I_{\text {off }}, and (d) on/off ratio from two 28 nm -mode NMOS and PMOS transistors.
Fig. 2. 关键器件参数的温度依赖性:(a) I DSAT I DSAT I_(DSAT)I_{\mathrm{DSAT}} ,(b) S S S S SSS S ,(c) I off I off  I_("off ")I_{\text {off }} ,(d) 开/关比,来自两个 28 nm -模式 NMOS 和 PMOS 晶体管。

Fig. 3. Carrier freeze-out effect-induced V TH V TH V_(TH)V_{\mathrm{TH}} shift at low temperatures. Δ V TH Δ V TH DeltaV_(TH)\Delta V_{\mathrm{TH}} is found to be 0.1 V 0.3 V 0.1 V 0.3 V 0.1V∼0.3V0.1 \mathrm{~V} \sim 0.3 \mathrm{~V} at 77 K .
Fig. 3. 低温下载流子冻结效应引起的 V TH V TH V_(TH)V_{\mathrm{TH}} 偏移。发现 Δ V TH Δ V TH DeltaV_(TH)\Delta V_{\mathrm{TH}} 在 77 K 时为 0.1 V 0.3 V 0.1 V 0.3 V 0.1V∼0.3V0.1 \mathrm{~V} \sim 0.3 \mathrm{~V}

Fig. 4. Tracking plot of V TH V TH V_(TH)V_{\mathrm{TH}} at different corners at T = T = T=T= 298 K (red) and 77 K (blue). Inset: Pelgrom’s plot unveils an exacerbated local mismatch at LT.
图 4. 在不同角点处, V TH V TH V_(TH)V_{\mathrm{TH}} 在 298 K(红色)和 77 K(蓝色)下的跟踪图。嵌入图:Pelgrom 图揭示了低温下的局部失配加剧。

Fig. 5. TCAD simulation results of the halo implantation dosedependent threshold voltage shift at different temperatures.
图 5. TCAD 模拟结果显示不同温度下边缘注入剂量依赖的阈值电压偏移。

Fig.6. The evolutions of I DSAT I DSAT  I_("DSAT ")I_{\text {DSAT }} with respect to the halo doping level, where the device driving strength is enhanced at low temperatures.
图 6. 随着边缘掺杂水平的变化, I DSAT I DSAT  I_("DSAT ")I_{\text {DSAT }} 的演变,在低温下器件驱动强度增强。

Fig.7. TCAD simulation suggests that the lower Fig. 8. The zero-bias leakage implant strategy can reduce the V the distribut V the distribut  V_("the distribut ")V_{\text {the distribut }} thanks to the reduced sensitivity of the channel issue in low- V TH V TH  V_("TH ")V_{\text {TH }} devices can potential to random dopant fluctuations.
图 7. TCAD 模拟表明,较低的 V the distribut V the distribut  V_("the distribut ")V_{\text {the distribut }} 可以通过减少通道问题对随机掺杂波动的敏感性来降低零偏漏电流植入策略在低 V TH V TH  V_("TH ")V_{\text {TH }} 器件中的潜在影响。

thall on/off ratio  阈值开/关比
be alleviated at LT.
在低温下得到缓解。

Fig. 9 Co-optimization of cryogenic CMOS device through the channel-doping-profile engineering guided by TCAD Fig. 10. Measured I DS V GS I DS V GS I_(DS)-V_(GS)I_{\mathrm{DS}}-V_{\mathrm{GS}} transfer simulation. By adjusting the halo implant dose within the 10 15 10 15 10^(15)10^{15} atom / cm 3 / cm 3 //cm^(3)/ \mathrm{cm}^{3} range, the (a) threshold voltage is tuned in characteristics of the optimized lowthe [ 0.1 V , 0.3 V ] [ 0.1 V , 0.3 V ] [0.1V,0.3V][0.1 \mathrm{~V}, 0.3 \mathrm{~V}] region at 77 K , (b) the zero-bias leakage current is well-below the 10 13 A 10 13 A 10^(-13)A10^{-13} \mathrm{~A}, and © the device-to- V TH V TH  V_("TH ")V_{\text {TH }} Cryo-CMOS device at 77 K device variation is as low as σ ( V TH ) < 30 mV σ V TH < 30 mV sigma(V_(TH)) < 30mV\sigma\left(V_{\mathrm{TH}}\right)<30 \mathrm{mV}. Reference and optimized cryo-CMOS devices are denoted by circles ( O ) ( O ) (O)(\mathrm{O}) and black stars ( ) ( ) (***)(\star), respectively. (blue) and 298 K (red). Inset: TEM image of the fabricated transistor.
图 9 通过 TCAD 引导的沟道掺杂轮廓工程实现低温 CMOS 器件的协同优化 图 10. 测量的 0#转移特性模拟。通过在 10 15 10 15 10^(15)10^{15} 原子 / cm 3 / cm 3 //cm^(3)/ \mathrm{cm}^{3} 范围内调整环绕区植入剂量,可以在 77 K 的优化低阈值电压区域(a)中调节阈值电压,在 77 K 时(b)将零偏漏电流保持在 [ 0.1 V , 0.3 V ] [ 0.1 V , 0.3 V ] [0.1V,0.3V][0.1 \mathrm{~V}, 0.3 \mathrm{~V}] 以下,并且(c)器件之间的低温 CMOS 器件变异低至 10 13 A 10 13 A 10^(-13)A10^{-13} \mathrm{~A} 。参考和优化的低温 CMOS 器件分别用蓝色圆圈和黑色星号表示。插入图:制备的晶体管的透射电子显微镜图像。