Linux中最常見的文字查詢與處理工具

阿新 • • 發佈：2019-01-04

find

find - search for files in a directory hierarchy

最常見的用法是 find <path> -name <filename> ,意思是在<path>目錄以及子目錄下查詢名為<filename>的檔案，目錄

是否查詢符號連結、根據許可權、使用者名稱、修改時間(mtime -n/+n)、訪問時間(atime -n/+n)、狀態變化時間(ctime -n/+n)、檔案大小、檔案型別。而且可以對查詢到的檔案執行刪除操作、執行固定命令等等

例如，查詢/tmp及其所有子目錄下所有以core開頭的檔案和資料夾，並刪除：

[email protected]:# which rm
/bin/rm
[email protected]:# ls -ltr /tmp | grep -i core
-rw-r--r-- 1 root root 0 Jul 6 06:25 core
drwxr-xr-x 2 root root 4096 Jul 6 06:25 core01
-rw-r--r-- 1 root root 0 Jul 6 06:25 core2
[email protected]:# find /tmp -name 'core*'
/tmp/core2
/tmp/core01
/tmp/core
[email protected]

:# find /tmp -name 'core*' | xargs /bin/rm -rf

查詢所有大於100MB的檔案：

[email protected]:~# find / -size +100M

which

which - locate a command

主要是查詢當前環境中可以執行的命令所在位置，其實它是在PATH環境變數裡的目錄裡面查詢

例如，查詢ls命令所在的位置：
[email protected]:# which ls
/bin/ls

locate

locate - find files by name

locate和find的最大的區別是，find是要search的，而locate是從資料庫裡讀的（ubuntu下面是/var/lib/mlocate/mlocate.db），所以會比較快。但有一個缺點，需要更新資料庫，才能有相關記錄，雖然系統會自動更新，但可能有來不及的時候,就需要手動更新

[email protected]:~# touch file2.txt
[email protected]:~# locate file2.txt
[email protected]:~# updatedb
[email protected]:~# locate file2.txt
/root/file2.txt

另外，據我測試，在/tmp目錄下的檔案，不會包含進locate的資料庫
[email protected]:~# touch /root/file1.txt
[email protected]:~# touch /tmp/file1.txt
[email protected]:~# locate file1.txt
[email protected]:~# find / -name file1.txt
/root/file1.txt
/tmp/file1.txt
[email protected]:~# updatedb
[email protected]:~# locate file1.txt
/root/file1.txt
[email protected]:~#

grep

print lines matching a pattern

查詢兩個關鍵字中的任意一個：
[email protected]:~# grep -e root -e qingsong /etc/passwd
root:x:0:0:root:/root:/bin/bash
qingsong:x:1000:1000:qingsong,,,:/home/qingsong:/bin/bash

[email protected]:~# grep -E 'root|qingsong' /etc/passwd
root:x:0:0:root:/root:/bin/bash
qingsong:x:1000:1000:qingsong,,,:/home/qingsong:/bin/bash

-i 忽略大小寫
-v 反向匹配
-c 只顯示匹配的條數
-n 還要顯示每一行的行號
-m NUM 只顯示前NUM條記錄
-A NUM 顯示匹配條目以及後面NUM行
-B NUM 顯示匹配條目以及前面NUM行
-C NUM 顯示匹配條目以及前後NUM行

[email protected]:~# cat 1.txt
adidas
bomb
cruise
dazzle
efficient
global
hard
is
jack
long
moon
no
operate
paste
quality
[email protected]:~# grep o -m 3 1.txt
bomb
global
long
[email protected]:~# grep effi -A 3 1.txt
efficient
global
hard
is
[email protected]:~# grep effi -B 3 1.txt
bomb
cruise
dazzle
efficient
[email protected]:~# grep effi -C 2 -n 1.txt
3-cruise
4-dazzle
5:efficient
6-global
7-hard

sort

排序，常用的語法如下：
sort [-ntkr] file
-n 數字排序
-t 分隔符
-k 指定列數
-r 反向排序

[email protected]:~$ cat sort.txt
b:3 11
c:2 123
a:4 24
e:5 52
d:1 01
f:11 111
[email protected]:~$ sort sort.txt
a:4 24
b:3 11
c:2 123
d:1 01
e:5 52
f:11 111
以“:”為分隔符，按照第二列排序
[email protected]:~$ sort -t ":" -k 2 sort.txt
d:1 01
f:11 111
c:2 123
b:3 11
a:4 24
e:5 52

以空格為分隔符，按照第二列排序
[email protected]:~$ sort -k 2 sort.txt
d:1 01
b:3 11
f:11 111
c:2 123
a:4 24
e:5 52

這裡之所以將111排在了24和52前面，是因為它並沒有被當作數字來處理，若想當數字處理，要加上 -n
[email protected]:~$ sort -k 2 -n sort.txt
d:1 01
b:3 11
a:4 24
e:5 52
f:11 111
c:2 123

uniq

uniq - report or omit repeated lines，也就是報告或者忽略相鄰的重複行

常見引數如下：
-c, --count 在每行前列印重複的次數
-d, --repeated 只打印出現了重複的行
-u, --unique 只打印沒有重複過的行
-i, --ignore-case 忽略大小寫

注意，只比較相鄰的兩行是否重複，所以一般和sort配合使用：
[email protected]:~$ cat uniq.txt
aaa
abc
12354
abc
12354
ccc
abc

uniq並沒有效果，因為沒有重複行
[email protected]:~$ cat uniq.txt | uniq
aaa
abc
12354
abc
12354
ccc
abc

將重複行去掉:
[email protected]:~$ cat uniq.txt | sort | uniq
12354
aaa
abc
ccc

只顯示出現了重複的行：
[email protected]:~$ cat uniq.txt | sort | uniq -d
12354
abc

只顯示沒有重複過的行：
[email protected]:~$ cat uniq.txt | sort | uniq -u
aaa
ccc

顯示出現了重複的行，且顯示重複的次數：
[email protected]:~$ cat uniq.txt | sort | uniq -d -c
2 12354
3 abc

cut

cut - remove sections from each line of files

擷取文字某一列
-d 指定分割符
-f 指定列
-c 指定字元
-s 不列印不包含分割符的行

以/etc/passwd為例：
[email protected]:~$ cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
...<略>...

只擷取第一列，即所有使用者：
[email protected]:~$ cat /etc/passwd | cut -f 1 -d ":"
root
daemon
bin
sys
sync
...<略>...

擷取第1列和第6列：
[email protected]:~$ cat /etc/passwd | cut -f 1,6 -d ":"
root:/root
daemon:/usr/sbin
bin:/bin
sys:/dev
sync:/bin
...<略>...

擷取每一行的第1-5個字元，以及7-10個字元：
[email protected]:~$ cat /etc/passwd | cut -c 1-5,7-10
root::0:0
daemo:x:1
bin:x2:2:
sys:x3:3:
sync::4:6
...<略>...

tr

translate or delete characters，注意，tr的作用物件是 characters，而不是string

tr [OPTION]... SET1 [SET2]
把SET1中的內容替換為SET2中的內容

-d, --delete delete characters in SET1, do not translate

SETs are specified as strings of characters. Most represent themselves. Interpreted sequences are:

\t horizontal tab

CHAR1-CHAR2
all characters from CHAR1 to CHAR2 in ascending order

[:alnum:]
all letters and digits

[:alpha:]
all letters

[:blank:]
all horizontal whitespace

[:lower:]
all lower case letters

[:upper:]
all upper case letters

[email protected]:~$ cat tr.txt
Hello World
This is only a sample
112345666789

所有小寫字母換為大寫
[email protected]:~$ cat tr.txt | tr a-z A-Z
HELLO WORLD
THIS IS ONLY A SAMPLE
112345666789

所有大寫換為小寫
[email protected]:~$ cat tr.txt | tr [:upper:] [:lower:]
hello world
this is only a sample
112345666789

把所有空格換為TAB, -s的作用去掉重複，在這裡，即多個空格被視為一個空格：
[email protected]:~$ cat tr.txt | tr -s [:blank:] '\t'
Hello World
This is only a sample
112345666789

刪除所有的母音字元
[email protected]:~$ cat tr.txt | tr -d aeiouAEIOU
Hll Wrld
Ths s nly smpl
112345666789

對映：
[email protected]:/tmp$ echo 12345 | tr '0-9' '9876543210'
87654

補集：
tr -c [set1] [set2]
其中set2可選，補集表示不在這個集合裡，最常見的就是從輸入檔案中將不在補集裡的內容刪除：
[email protected]:/tmp$ echo hello 1 char 2 next 4 | tr -d -c '0-9 \n'
1 2 4

還有一個奇淫巧技，把一個文件中的數字相加：
[email protected]:/tmp$ cat count
5
14
6
18
12
7
11
9
12
6
11
13
7
7
[email protected]:/tmp$ cat count | echo $[ $( tr '\n' + ) 0 ]
138

分開來看，是這樣的，先把換行符替換為+號
[email protected]:/tmp$ cat count | echo $( tr '\n' + )
5+14+ 6+18+12+ 7+11+ 9+12+ 6+11+13+ 7+ 7+
然後在最後補充了一個0，變為一個表示式
[email protected]:/tmp$ cat count | echo $( tr '\n' + ) 0
5+14+ 6+18+12+ 7+11+ 9+12+ 6+11+13+ 7+ 7+ 0
最後計算表示式的值$[ 表示式 ]，由於(())和[]一樣，所以也可以寫成下面的形式：
cat count | echo $(($( tr '\n' + ) 0))

paste

paste - merge lines of files

SYNOPSIS
paste [OPTION]... [FILE]...

DESCRIPTION
Write lines consisting of the sequentially corresponding lines from each FILE, separated by TABs, to standard output.

[email protected]:~$ cat a.txt
1
2
3
4
[email protected]:~$ cat b.txt
a
b
c
[email protected]:~$ paste a.txt b.txt
1 a
2 b
3 c
4
[email protected]:~$ paste -d ':' a.txt b.txt
1:a
2:b
3:c
4:

head

head - output the first part of files

列印每個檔案的前10行：
[email protected]:~$ head pass.txt ls.txt
==> pass.txt <==
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin

==> ls.txt <==
total 526828
drwxr-xr-x 3 qingsong qingsong 4096 Jul 24 07:42 .
drwxr-xr-x 3 root root 4096 Jul 4 07:22 ..
-rw-rw-r-- 1 qingsong qingsong 8 Jul 24 07:37 a.txt
-rw------- 1 qingsong qingsong 983 Jul 17 08:19 .bash_history
-rw-r--r-- 1 qingsong qingsong 220 Jul 4 07:22 .bash_logout
-rw-r--r-- 1 qingsong qingsong 3637 Jul 4 07:22 .bashrc
-rw-rw-r-- 1 qingsong qingsong 6 Jul 24 07:37 b.txt
-rw-r--r-- 1 qingsong qingsong 3443 Jul 24 05:17 by the number of occurrencesq
drwx------ 2 qingsong qingsong 4096 Jul 17 07:17 .cache

列印前3行：
[email protected]:~$ head -n 3 line.txt
This is line 1
This is line 2
This is line 3

列印除了最後3行的所有行：
[email protected]:~$ head -n -3 line.txt
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6

還可以列印前n個位元組
[email protected]:~$ head -c 25 line.txt
This is line 1

This is [email protected]:~$

tail

tail - output the last part of files

tail和head差不多，只顯示檔案末尾的3行:

[email protected]:~$ tail -n 3 line.txt
This is line 7
This is line 8
This is line 9

列印除從第3行開始的行：

[email protected]:~$ tail -n +3 line.txt
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9

它可以以follow模式跟蹤一個檔案，有些檔案的內容易發生變化，比如日誌，

[email protected]:/var/log$ tail -n 100 -f syslog

less

空格：向下翻頁
b:向上翻頁
/字串：向下搜尋
?字串：向上搜尋
n：下一個搜尋項
N：上一個搜尋項

空格：向下翻頁
b:向上翻頁，只對檔案起作用，對管道符不起作用
/字串：向下搜尋

正則表示式

匹配一個字元: "."
匹配前一個字元0次或者多次："*"
精確控制前一個字元出現的次數：
"\{n\}"，n次
"\{n,\}"，至少n次
"\{n,m\}"，至少n次，至多m次
匹配開頭的字元："^"
匹配結束的字元："$"
匹配若干字元中的任一字元："[]", 示例，任一個大寫字母[A-Z]
匹配轉義字元："\"
匹配單詞的左邊界："\<"
匹配單詞的右邊界："\>"
匹配單詞與符號的邊界 "\b“
匹配單詞與單詞,符號與符號之間的邊界 "\B"
單詞可以是中文字元,英文字元,數字；符號可以是中文符號,英文符號,空格,製表符,換行
匹配字母、數字和下劃線："\w"，相當於[A-Za-z0-9]
匹配非字母、非數字和非下劃線：'\W'，相當於[^A-Za-z0-9]
匹配任何空白字元 "\s"

擴充套件正則表示式：
匹配前一個字元0次或1次："?"
匹配前一個字元至少1次："+"
或："|"
匹配一系列的可替換的字元："()", 通常與"|"一起用，比如，匹配 hard, hold或者hood: h(ar|ol|oo)d
正則表示式線上解析工具，它可以把正則表示式（擴充套件）圖解為圖片形式，方便驗證自己的正則表示式和理解別人的正則表示式

sed

awk

<待續>

Linux中最常見的文字查詢與處理工具

find

which

locate

grep

sort

uniq

cut

tr

paste

head

tail

less

more

正則表示式

sed

awk

Linux中最常見的文字查詢與處理工具

ubuntu (linux) 中的程序狀態查詢與管理 top/htop/ps/pgrep/kill

Linux 中最常用的目錄及文件管理命令

python常見異常分類與處理方法

Linux中find常見用法示例

Java中最常見的十道面試題

linux中應用程序的安裝與管理

Linux運維常見故障排查和處理的33個技巧匯總

Linux中最大進程數和最大文件數

Linux中最常用的JAVA_HOME配置

Linux中邏輯卷的快照與還原

Linux -- Centos6 yum安裝相關問題與處理

Linux中的常見網絡配置

Linux中web伺服器的搭建與配置

資料結構中最常見的排序演算法-Java

Mysql5.7 的錯誤日誌中最常見的note級別日誌解釋

Linux中常用命令（檔案與目錄）

Python基礎學習-Python中最常見括號()、[]、{}的區別

linux中批量替換文字中字串

Python中最常見括號()、[]、{}有什麼區別？

Linux中最常見的文字查詢與處理工具

find

which

locate

grep

sort

uniq

cut

tr

paste

head

tail

less

more

正則表示式

sed

awk

相關推薦