Nginx支持反爬蟲並限制客戶端的請求的並發數
阿新 • • 發佈:2018-03-03
反爬蟲並限制客戶端請求的並發數cat /usr/local/nginx/conf/agent_deny.conf
if ($http_user_agent ~* "qihoobot|Baiduspider|Googlebot|Googlebot-Mobile|Googlebot-Image|Mediapartners-Google|Adsbot-Google|Feedfetcher-Google|Yahoo! Slurp|Yahoo! Slurp China|YoudaoBot|Sosospider|Sogou spider|Sogou web spider|MSNBot|ia_archiver|Tomato Bot|Catall Spider|AcoiRobot") { return 403; } if ($http_user_agent ~ "WinHttp|WebZIP|FetchURL|node-superagent|java/|FeedDemon|Jullo|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|Java|Feedly|Apache-HttpAsyncClient|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|BOT/0.1|YandexBot|FlightDeckReports|Linguee Bot|iaskspider^$") { return 403; } if ($request_method !~ ^(GET|HEAD|POST)$) { return 403; } if ($http_user_agent ~* (Python|Java|Wget|Scrapy|Curl|HttpClient|Spider)) { return 403; } #屏蔽單個IP的命令是 #deny 123.45.6.7 #封整個段即從123.0.0.1到123.255.255.254的命令 #deny 123.0.0.0/8 #封IP段即從123.45.0.1到123.45.255.254的命令 #deny 124.45.0.0/16 #封IP段即從123.45.6.1到123.45.6.254的命令是 #deny 123.45.6.0/24 以下IP皆為流氓 deny 58.95.66.0/24;
註釋:
一般情況下是允許百度爬蟲和谷歌爬蟲來爬取網站的內容的,例如網站官網的首頁等,所以百度的爬蟲和谷歌的爬蟲是可以放開,允許來爬取網站內容的。
此文件agent_deny.conf 包含到網站官網的server虛擬主機裏面的。
以下的nginx配置文件是方向代理負載均衡的配置文件:
server { listen 80; server_name pk.tltest.com static.tltest.com; access_log /home/wwwlogs/access.log main; ## 這個就是反爬蟲文件 include /usr/local/nginx/conf/agent_deny.conf; location / { limit_req zone=reqip burst=200 nodelay; proxy_cache cache_one; proxy_cache_valid 200 304 301 302 99s; proxy_cache_valid any 1s; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header REMOTE-HOST $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Connection ""; proxy_http_version 1.1; proxy_next_upstream off; proxy_ignore_client_abort on; proxy_ignore_headers Set-Cookie Cache-Control; client_max_body_size 30m; client_body_buffer_size 256k; proxy_connect_timeout 75; proxy_send_timeout 300; proxy_read_timeout 300; proxy_buffer_size 1m; proxy_buffers 8 512k; proxy_busy_buffers_size 2m; proxy_temp_file_write_size 2m; proxy_next_upstream error timeout invalid_header http_500 http_502 http_503; proxy_max_temp_file_size 128m; proxy_pass http://backend; } location *\.(php|python)$ { proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; proxy_pass http://backend; } ####nginx前端限制客戶端對網站某個目錄的請求搜索的並發數 location = /novel/search { limit_conn conip 2; limit_req zone=reqip burst=3 nodelay; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; proxy_pass http://backend; #access_log /home/wwwlogs/search.log main; } ####nginx前端限制客戶端對網站某個目錄的文件內容請求下載的並發數 location = /novel/read/cache { limit_conn conip 1; limit_req zone=reqip burst=2 nodelay; limit_rate 512k; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; proxy_pass http://backend; #access_log /home/wwwlogs/download.log main; } ####nginx前端限制客戶端對網站某個目錄的文件下apk下載的並發數 location = /novel/read/content { limit_conn conip 5; limit_req zone=reqip burst=10 nodelay; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; proxy_pass http://backend; } }
參考文檔:
https://www.centos.bz/2018/01/nginx%E6%94%AF%E6%8C%81https%E5%B9%B6%E4%B8%94%E6%94%AF%E6%8C%81%E5%8F%8D%E7%88%AC%E8%99%AB/
Nginx支持反爬蟲並限制客戶端的請求的並發數