詭異的磁碟空間100%報警分析得出df -h與du -sh *的根本性差別

阿新 • • 發佈：2019-02-07

前言：
早晨磁碟報警剛清空完tomcat和nginx日誌，使用的命令是類似echo "" > show_web-error.log或者> show_web-debug.log清空語句，然後rm -rf 掉一些tar.gz包，空出來30G空間。而且也關閉了tomcat的debug資訊。剛剛又接到報警，磁碟100%了。怎麼回事？

1，進去df -h下，確實100%了，如下所示：
[[email protected] ~]# df -h
檔案系統容量已用可用已用% 掛載點
/dev/mapper/VolGroup00-LogVol00
113G 113G 0

100% /
/dev/sda1 99M 13M 82M 14% /boot
tmpfs 8.8G 0 8.8G 0% /dev/shm
確實已經100%了，再去/去檢查

2，去/根目錄check，du -sh *
[[email protected] ~]# cd /
[[email protected] /]# du -sh *
7.8M bin
6.9M boot
131M data
196K dev
111M etc
178M home
131M lib
23M lib64
119M logs
16K lost+found
8.0K media
0 misc
8.0K mnt

0 net
0 nohup.out
3.8G opt
15M pcre-8.33
2.1M pcre-8.33.zip
du: 無法訪問 “proc/11575/task/11575/fd/1565”: 沒有那個檔案或目錄
du: 無法訪問 “proc/15403/task/14464/fd/625”: 沒有那個檔案或目錄
0 proc
1.4G product
153M repo
143M root
37M sbin
8.0K selinux
363M soft
8.0K srv
0 sys
20K temp
100K tftpboot
2.1G tmp
8.6G usr
184M var
30M varnish-3.0.3
56M zabbix-2.0.8
[[email protected]

/]#
看到，佔據的磁碟空間所有的加起來也不到30G，可是df -h下來，確實100%呢？差異在哪裡？

3，baidu，google資料，找到 http://www.chinaunix.net/old_jh/6/465673.html 裡面有這麼2段話：
(1):
When you open a file, you get a pointer. Subsequent writes to this file
references this file pointer. The write call does not check to see if the file
is there or not. It just writes to the specified number of characters starting
at a predetermined location. Regardless of whether the file exist or not, disk
blocks are used by the write operation.

The df command reports the number of disk blocks used while du goes through the
file structure and and reports the number of blocks used by each directory. As
far as du is concerned, the file used by the process does not exist, so it does
not report blocks used by this phantom file. But df keeps track of disk blocks
used, and it reports the blocks used by this phantom file.
以及leolein朋友的回覆：
謝謝，就是這個原因。
我因為磁碟快滿了就刪除了一些過期的檔案，可能應用程式還在使用這些檔案控制代碼，所以導致了我說的問題。
我把所有的應用程式都停止後，du和df的結果就大致相同了

(2):
This section gives the technical explanation of why du and df sometimes report
different totals of disk space usage.

When a program that is running in the background writes to a file while the
process is running, the file to which this process is writing is deleted.
Running df and du shows a discrepancy in the amount of disk space usage. The
df command shows a higher value.

如果檔案已經刪除了，但是還有殘留的程序引用它（具體不知道怎麼表達好），則df看到的空間使用量並沒有減去那些已經刪除的檔案。而建立並寫入一個檔案是，判斷空間是否足夠是依據df（本人認為），所以df 100%的時候就不能寫入檔案了。--但是建立檔案是可以的，我做過測試。檢視這些殘留程序（姑且這麼稱呼，我也不知道那些程序叫什麼）的方法是lsof
# lsof /home | grep /home/oracle/osinfo | sort +8 | grep '^.*070920.*$'
sadc 17821 root 3w REG 253,1 326492112 926724 /home/oracle/osinfo/070920sar.data (deleted)
sadc 17861 root 3u REG 253,1 326492112 926724 /home/oracle/osinfo/070920sar.data (deleted)
sadc 17981 root 3u REG 253,1 326492112 926724 /home/oracle/osinfo/070920sar.data (deleted)
top 17858 root 1w REG 253,1 169919916 927111 /home/oracle/osinfo/070920top.data (deleted)
top 17977 root 1w REG 253,1 169919916 927111 /home/oracle/osinfo/070920top.data (deleted)
注意後面的deleted
然後把這些程序都kill掉就可以釋放空間了。

我想起了，我早晨在執行echo "" >shop_web.log類似操作的時候，並沒有停止tomcat應用，所以應用是一直往log裡面寫資料的，那麼我>的那一刻，是du -sh *可能看到磁碟空間有了，df -h也可以看到磁碟釋放了，但是當tomcat應用繼續往shop_web.log裡面寫日誌的時候，載入的還是最初開啟的那個執行>shop_web.log之前的佔據很大磁碟空間的快取檔案。所以磁碟其實一直沒有釋放掉，而能堅持一天不報警，是由於我rm了一些tar.gz包所釋放的空間。

4，重啟tomcat和nginx應用
所以，我應該重啟tomcat和nginx，應用不再載入舊的快取檔案，執行重啟tomcat命令，由於tomcat應用比較多，所以寫了一個指令碼來執行
[[email protected] local]# cat /root/start_tomcat_port.sh
#!/bin/bash
PID=`ps -eaf|grep apache-tomcat-6.0.37_$1 |grep -v grep |grep -v start_tomcat_port |awk '{print $2}'`
echo $1
echo $PID
kill -9 $PID
rm -rf /var/tomcat/$1.pid
/usr/local/apache-tomcat-6.0.37_$1/bin/startup.sh
[[email protected] local]#
執行重啟tomcat：
sh /root/start_tomcat_port.sh 6100;
sh /root/start_tomcat_port.sh 6200;
sh /root/start_tomcat_port.sh 6300;
sh /root/start_tomcat_port.sh 6400;
sh /root/start_tomcat_port.sh 6500;
sh /root/start_tomcat_port.sh 6700;
sh /root/start_tomcat_port.sh 7100;
sh /root/start_tomcat_port.sh 7200;
sh /root/start_tomcat_port.sh 7300;
執行重啟nginx：
service nginx restart

5，再去check下磁碟空間
[[email protected] local]# df -h
檔案系統容量已用可用已用% 掛載點
/dev/mapper/VolGroup00-LogVol00
113G 18G 90G 17% /
/dev/sda1 99M 13M 82M 14% /boot
tmpfs 8.8G 0 8.8G 0% /dev/shm
[[email protected] local]#

看到df -h命令正常了，已經釋放了90G的磁碟空間，現在磁碟使用率才17%，nagios報警解除了。

6，彙總一些原理分析
實現原理：
du -s命令通過將指定檔案系統中所有的目錄、符號連結和檔案使用的塊數累加得到該檔案系統使用的總塊數；
df命令通過檢視檔案系統磁碟塊分配圖得出總塊數與剩餘塊數。
du是使用者級程式，不考慮Meta Data（系統為自身分配的一些磁碟塊）

ps：應用程式開啟的檔案控制代碼沒有關閉的話，會造成df命令顯示的剩餘磁碟空間少。而du則不會。
例子：
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>

int main(int argc,char **argv)
{
if(open("tempfile",O_RDWR) < 0){
fprintf(stderr,"open error");
exit(-1);
}

if(unlink("tempfile") < 0){
fprintf(stderr,"unlink error");
exit(-1);
}

printf("file unlinked\n");
sleep(15);
printf("done\n");
exit(0);
}

詭異的磁碟空間100%報警分析得出df -h與du -sh *的根本性差別

詭異的磁碟空間100%報警分析得出df -h與du -sh *的根本性差別

centos 6.5 檢視df -ah 和 du -sh 空間不一致的情況問題分析

linux下df -hT和du -sh 顯示的資料非常不一致解決方法

Linux中檢視磁碟空間使用情況命令df與du的區別

向磁碟寫入資料提示：No spac left on device通過df -h檢視磁碟空間，發現沒滿，解決方法

向磁碟寫入資料提示：No spac left on device通過df -h檢視磁碟空間，發現沒滿

磁碟空間被未知資源佔盡分析

Linux中df命令：用來檢查linux伺服器的檔案系統的磁碟空間佔用情況

揪出佔用磁碟空間的真凶！介紹一個好用的磁碟空間分析清理工具

重新啟動postgre報錯時，解決方案 ( 由備份檔案佔用空間太大造成 ) (linux 命令 df -h 檢視磁碟空間)

Linux磁碟空間分析

Linux磁碟空間被未知資源耗盡100%

Linux學習筆記_系統分割槽資訊，磁碟空間及inode佔用檢視（df，fdisk）

df和du磁碟空間不一致

關於硬鏈接與軟連接占用磁盤空間問題的分析研究

HDFS文件系統空間使用情況分析

C++&C面試題100道分析(61-80)

mongoDb CPU利用率100%的分析和解決

MySQL服務器 IO 100%的分析與優化方案

Linux創造固定的檔案大小-預分配磁碟空間

詭異的磁碟空間100%報警分析得出df -h與du -sh *的根本性差別

相關推薦