MI50由于是被动散热设计,作为普通消费级使用,需要改装散热器。
我使用的方案是3D打印外壳+涡轮风扇,4P风扇口插主板。
由于是风道直吹,散热效果还是很不错的。但涡轮的转速是非常快的,在全速运行时噪音特别大。
windows系统下有FanControl这个工具可以可视化的配置自动温度控制转速。那么当MI50在Linux系统下部署使用,是否也能根据显卡的温度自动控制转速并随系统启动自动运行呢?可以的朋友,可以的。
首先我们安装这个工具:
sudo apt install lm-sensors
# 执行传感器探测 并一路YES
sudo sensors-detect
# 最终会探测到全部的可检测温度与控制pwm的芯片信息。
# 当询问是否自动将探测到的芯片驱动加入到驱动模块配置里,仍然选yes,
Do you want to add these lines automatically to /etc/modules? (yes/NO)
# 重启系统
reboot
系统重启后执行
sudo sensors
lm96163-i2c-13-4c
Adapter: SMBus I801 adapter at f000
temp1: +42.0°C (high = +70.0°C)
temp2: +50.6°C (low = +0.0°C, high = +85.0°C)
(crit = +110.0°C, hyst = +100.0°C) sensor = CPU diode
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +29.0°C (high = +90.0°C, crit = +100.0°C)
Core 0: +24.0°C (high = +90.0°C, crit = +100.0°C)
Core 1: +23.0°C (high = +90.0°C, crit = +100.0°C)
Core 2: +24.0°C (high = +90.0°C, crit = +100.0°C)
Core 3: +24.0°C (high = +90.0°C, crit = +100.0°C)
Core 4: +24.0°C (high = +90.0°C, crit = +100.0°C)
Core 5: +25.0°C (high = +90.0°C, crit = +100.0°C)
Core 6: +23.0°C (high = +90.0°C, crit = +100.0°C)
Core 8: +22.0°C (high = +90.0°C, crit = +100.0°C)
Core 9: +23.0°C (high = +90.0°C, crit = +100.0°C)
Core 10: +23.0°C (high = +90.0°C, crit = +100.0°C)
Core 11: +24.0°C (high = +90.0°C, crit = +100.0°C)
Core 12: +24.0°C (high = +90.0°C, crit = +100.0°C)
Core 13: +23.0°C (high = +90.0°C, crit = +100.0°C)
Core 14: +24.0°C (high = +90.0°C, crit = +100.0°C)
nvme-pci-0200
Adapter: PCI adapter
Composite: +39.9°C (low = -273.1°C, high = +84.8°C)
(crit = +84.8°C)
Sensor 1: +39.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +34.9°C (low = -273.1°C, high = +65261.8°C)
nct6793-isa-0a20
Adapter: ISA adapter
in0: 1.81 V (min = +0.00 V, max = +1.74 V) ALARM
in1: 1.22 V (min = +0.00 V, max = +0.00 V) ALARM
in2: 3.33 V (min = +0.00 V, max = +0.00 V) ALARM
in3: 3.34 V (min = +0.00 V, max = +0.00 V) ALARM
in4: 248.00 mV (min = +0.00 V, max = +0.00 V) ALARM
in5: 128.00 mV (min = +0.00 V, max = +0.00 V) ALARM
in6: 1.02 V (min = +0.00 V, max = +0.00 V) ALARM
in7: 3.31 V (min = +0.00 V, max = +0.00 V) ALARM
in8: 3.25 V (min = +0.00 V, max = +0.00 V) ALARM
in9: 1.06 V (min = +0.00 V, max = +0.00 V) ALARM
in10: 152.00 mV (min = +0.00 V, max = +0.00 V) ALARM
in11: 128.00 mV (min = +0.00 V, max = +0.00 V) ALARM
in12: 1.22 V (min = +0.00 V, max = +0.00 V) ALARM
in13: 1.01 V (min = +0.00 V, max = +0.00 V) ALARM
in14: 168.00 mV (min = +0.00 V, max = +0.00 V) ALARM
fan1: 0 RPM (min = 0 RPM)
fan2: 1296 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
SYSTIN: +117.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor
CPUTIN: +49.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermal diode
AUXTIN0: +25.5°C sensor = thermistor
AUXTIN1: +127.0°C sensor = thermistor
AUXTIN2: +127.0°C sensor = thermistor
AUXTIN3: +127.0°C sensor = CPU diode
PECI Agent 0: +8.5°C
PCH_CHIP_CPU_MAX_TEMP: +0.0°C
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
PCH_MCH_TEMP: +0.0°C
Agent0 Dimm0 : +0.0°C
TSI2_TEMP: +3892314.0°C
TSI3_TEMP: +3892314.0°C
TSI4_TEMP: +3892314.0°C
TSI5_TEMP: +3892314.0°C
TSI6_TEMP: +3892314.0°C
TSI7_TEMP: +3892314.0°C
intrusion0: OK
intrusion1: ALARM
beep_enable: disabled
amdgpu-pci-0500
Adapter: PCI adapter
vddgfx: 743.00 mV
fan1: 0 RPM (min = 0 RPM, max = 3850 RPM)
edge: +49.0°C (crit = +100.0°C, hyst = -273.1°C)
(emerg = +105.0°C)
junction: +53.0°C (crit = +100.0°C, hyst = -273.1°C)
(emerg = +105.0°C)
mem: +54.0°C (crit = +94.0°C, hyst = -273.1°C)
(emerg = +99.0°C)
PPT: 30.00 W (cap = 300.00 W)
根据这些信息,编写了一个脚本,每间隔5秒读取一次amdgpu的温度,将pwm控制在一定范围内,并且随开机自动启动,该脚本也适用NVIDIA Tesla P100,V100等改装主动散热的设备。执行效果如下:
10:14:21 root@dev ~ → journalctl -f -u fan-control-gpu.service
Sep 24 10:13:20 dev fan-control-gpu.sh[16433]: [2025-09-24 10:13:20] 🌡️ 温度: 67°C → PWM: 151
Sep 24 10:13:25 dev fan-control-gpu.sh[16441]: [2025-09-24 10:13:25] 🌡️ 温度: 68°C → PWM: 155
Sep 24 10:13:30 dev fan-control-gpu.sh[16449]: [2025-09-24 10:13:30] 🌡️ 温度: 69°C → PWM: 158
Sep 24 10:13:35 dev fan-control-gpu.sh[16457]: [2025-09-24 10:13:35] 🌡️ 温度: 70°C → PWM: 162
Sep 24 10:13:40 dev fan-control-gpu.sh[16465]: [2025-09-24 10:13:40] 🌡️ 温度: 71°C → PWM: 166
Sep 24 10:13:55 dev fan-control-gpu.sh[16494]: [2025-09-24 10:13:55] 🌡️ 温度: 72°C → PWM: 170
Sep 24 10:14:15 dev fan-control-gpu.sh[16523]: [2025-09-24 10:14:15] 🌡️ 温度: 73°C → PWM: 173
Sep 24 10:14:20 dev fan-control-gpu.sh[16547]: [2025-09-24 10:14:20] 🌡️ 温度: 68°C → PWM: 155
Sep 24 10:14:25 dev fan-control-gpu.sh[16560]: [2025-09-24 10:14:25] 🌡️ 温度: 65°C → PWM: 143
Sep 24 10:14:30 dev fan-control-gpu.sh[16568]: [2025-09-24 10:14:30] 🌡️ 温度: 63°C → PWM: 136
Sep 24 10:14:35 dev fan-control-gpu.sh[16578]: [2025-09-24 10:14:35] 🌡️ 温度: 61°C → PWM: 128
自动化脚本参数说明:
PWM_DEVICE="/sys/class/hwmon/hwmon3/pwm2" # pwm控制设备
PWM_ENABLE="/sys/class/hwmon/hwmon3/pwm2_enable" # pwm开启设备号
TEMP_SENSOR="/sys/class/hwmon/hwmon1/temp1_input" # 温度检测设备
INTERVAL_TIME=5 # 间隔5秒探测一次,可以自行修改比如1秒一次
MIN_TEMP=40000 # 40°C → 单位是毫摄氏度 (millidegrees)
MAX_TEMP=80000 # 80°C 最高温度,用来对应最高转速
MIN_PWM=50 # 最低转速,可以设置0,可能会使风扇完全停转,与风扇的pwm控制逻辑有关
MAX_PWM=200 # 最高转速 ,最高可设置255,全速的涡轮噪音太大,但如果仍压不到温度,可以设置到最大
Bash脚本代码: