1. 程式人生 > >一.Python的命令行工具 學習筆記(Command line tool)

一.Python的命令行工具 學習筆記(Command line tool)

als douban ide url list clas useful main.c sele

命令行工具

Scrapy通過scrapy命令行工具進行控制,在此稱為“Scrapy工具”,以區別於子命令,我們稱之為“命令”或“Scrapy命令”。

Scrapy工具提供了多個命令,用於多種用途,每個命令都接受一組不同的參數和選項。

創建項目

scrapy startproject myproject [project_dir]

在命令行中創建項目

scrapy start myproject E:\pythoncode\

E:\pythoncode中創建myproject項目

接下來

cd E:\pythoncode

按:如果project_dir沒有指定,project_dir將是相同的myproject

控制項目

例如:scrapy genspider mydomain mydomain.com

創建一個爬蟲,名字為mydomain,爬取mydomain.com網站

可以在E:\pythoncode\myproject\spiders 看到這個爬蟲的代碼

scrapy -h

我們可以看到以下:

Usage:

scrapy <command> [options] [args]

Available commands:

bench Run quick benchmark test

fetch Fetch a URL using the Scrapy downloader

genspider Generate new spider using pre-defined templates

runspider Run a self-contained spider (without creating a project)

settings Get settings values

shell Interactive scraping console

startproject Create new project

version Print Scrapy version

view Open URL in browser, as seen by Scrapy

[ more ] More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command

註:意思是不懂就scrapy <command> -h 自己學

startproject命令

句法: scrapy startproject <project_name> [project_dir]

示例:

scrapy startproject myproject

genspider命令

句法: scrapy genspider [-t template] <name> <domain>

示例:

E:\pythoncode>scrapy genspider -l
Available templates:
  basic
  crawl
  csvfeed
  xmlfeed

註:-t中可用的模版basic....xmlfeed

E:\pythoncode>scrapy genspider example example.com
Created spider example using template basic in module:
  myproject.spiders.example

註:scrapy genspider xx xx.com相當於 scrapy genspider -t basic xx xx.com


E:\pythoncode>scrapy genspider -t crawl scrapyorg scrapy.org
Created spider scrapyorg using template crawl in module:
  myproject.spiders.scrapyorg

註:-t crawl創建出來的跟-t crawl不一樣的,我想大概是為了滿足網站一些不可知的需求吧

這只是一個方便的快捷方式命令,用於根據預定義的模板創建爬蟲,但肯定不是創建爬蟲的唯一方法。您可以自己創建爬蟲源代碼文件,而不是使用此命令。

crawl命令

句法: scrapy crawl <spider>

示例:

E:\pythoncode>scrapy crawl mydomain

[scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: myproject)
.................

註:crawl命令 在示例教程裏已經用的不能再熟悉了 使用爬蟲開始抓取。

check命令

句法: scrapy check [-l] <spider>

示例:

E:\pythoncode>scrapy check -l mydomain

E:\pythoncode>scrapy check -l

註:現在沒顯示什麽了,估計是版本的已經升級了。我猜以前是用來檢查存在這個<spider>的 和訪問速度的吧?

list命令

句法: scrapy list

示例:

E:\pythoncode>scrapy list
example
mydomain
scrapyorg

註:列出所有爬蟲的名字

edit命令

句法: scrapy edit <spider>

Edit the given spider using the editor defined in the EDITOR environment variable or (if unset) the EDITOR setting.

This command is provided only as a convenience shortcut for the most common case, the developer is of course free to choose any tool or IDE to write and debug spiders.

註:不懂,貼上官方原話

fetch命令

句法: scrapy fetch <url>

示例:

E:\pythoncode>scrapy fetch --nolog http://www.example.com/some/page.html

<?xml version="1.0" encoding="iso-8859-1"?>
.....................................

註:使用Scrapy下載程序下載給定的URL,並將內容寫入標準輸出。


E:\pythoncode>scrapy fetch --nolog --headers http://www.example.com/
> Accept-Language: en
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> User-Agent: Scrapy/1.5.1 (+https://scrapy.org)
> Accept-Encoding: gzip,deflate
.......................................................

註:--headers 你是用什麽狀態去訪問url的

list命令

句法: scrapy view <url>

示例:

E:\pythoncode>scrapy view https://movie.douban.com/

註:進去之後是不是403錯誤啊,豆瓣會判斷你以什麽姿態去訪問的
我們看看--headers是什麽效果

E:\pythoncode>scrapy fetch --headers --nolog https://movie.douban.com/

> Accept-Encoding: gzip,deflate
> User-Agent: Scrapy/1.5.1 (+https://scrapy.org)
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: en
>
< Server: dae
< Content-Type: text/html
< Date: Sun, 07 Oct 2018 15:06:06 GMT

註:User-Agent: Scrapy/1.5.1 (+https://scrapy.org)
下次要爬這些網址的時候記得改下這個哦
不懂的話推薦一個網址https://blog.csdn.net/u012195214/article/details/78889602

shell命令

句法: scrapy shell [url]

示例:

E:\pythoncode>scrapy shell https://movie.douban.com/

[scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: myproject)
........................

註:shell要安裝ipthon好像,進去學習的話自己看官網的教程示例
E:\pythoncode
>scrapy shell --nolog https://movie.douban.com/ -c "(response.status, response.url)" (403, https://movie.douban.com/) 註:403訪問錯誤,https://movie.douban.com/ 200訪問成功 HTTP響應代碼,不懂的話:https://blog.csdn.net/jackfrued/article/details/25662527

Parse命令

句法: scrapy parse <url> [options]

獲取給定的URL並使用處理它的爬蟲解析它

示例:這個爬蟲解析沒有,所以沒有示例

settings命令

句法: scrapy settings [options]

示例:

E:\pythoncode>scrapy settings --get BOT_NAME
myproject

註:項目名

E:\pythoncode>scrapy settings --get DOWNLOAD_DELAY
0

註:下載延遲

不懂的話自己打開項目目錄下的scrapy.cfg看啦,加上scrapy settings -h

runspider命令

句法: scrapy runspider <spider_file.py>

示例:

E:\pythoncode>scrapy runspider E:\pythoncode\myproject\spiders\mydomain.py

[scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: myproject)
...................................

Version命令

句法: scrapy version [-v]

示例:

E:\pythoncode>scrapy version
Scrapy 1.5.1

E:\pythoncode>scrapy version -v
Scrapy       : 1.5.1
lxml         : XXX
libxml2      : XXX
cssselect    : XXX
parsel       : XXX
w3lib        : XXX
Twisted      : XXX
Python       : 3.XXXX
pyOpenSSL    : XXX
cryptography : XXX
Platform     : XXX

註:scrapy用到的庫的版本號

原話:Prints the Scrapy version. If used with -v it also prints Python, Twisted and Platform info, which is useful for bug reports.

bench命令

句法: scrapy bench

註:運行基準測試

原話:Run a quick benchmark test.

不懂就:https://docs.scrapy.org/en/latest/topics/benchmarking.html#benchmarking

自定義項目命令

沒玩過,哪天玩下。

學不急而能修,附上源頭活水的官方地址:https://docs.scrapy.org/en/latest/topics/commands.html

一.Python的命令行工具 學習筆記(Command line tool)