R中使用rvest爬取資料小試

阿新 • • 發佈：2018-11-06

總結R中使用 xpath 和 css selectors 獲取標籤內容(xpath功能強大，而CSS選擇器通常語法比較簡潔，執行速度更快些)

例:抓取下面標籤的內容：

    <h3 class="lister index unbold text"><span>小明他很忙</span></h3>

(1)使用xpath(與python裡使用xpath 相似，R中可以使用html_text() 獲取標籤中的內容，如"<span>小明他很忙</span>"中標籤內容為“小明他很忙”；使用html_att("屬性") 獲取屬性值):

    rvest::html_nodes(webPage, xpath = '//h3[@class="lister index unbold text"]/span') %>% rvest::html_text()

(2)使用css選擇器

使用之前，我們首先要了解一下幾點內容：

1.在css中 "class" 用 "." 對映; "id" 用 "#" 對映

2.在css選擇器中，如果class裡帶的空格，用.來代替空格

h3 class="lister index unbold text" -> h3.lister index unbold text(class裡有空格) -> h3.lister.index.unbold.text

    rvest::html_nodes(webPage, css = "h3.lister.index.unbold.text span") %>% rvest::html_text()

1.安裝rvest、xml2包

    library(pacman)
    pacman::p_load(rvest, xml2)

2.載入rvest、xml2包

    # 載入工具包
    library(rvest)
    library(xml2)

3.使用兩個工具包爬去資料

    # 設定爬取的網址     
    url <- "https://www.imdb.com/search/title?count=100&release_date=2016,2016&title_type=feature"
    # 獲取頁面內容(頁面原始碼)
    webPage <- xml2::read_html(x = url, encoding = "UTF-8") 

    # ======= 方法1 使用xpath ==========
    # 電影名稱
    movieName <- rvest::html_nodes(webPage, xpath = '//h3[@class="lister-item-header"]/a/text()')

    # === 備註 ===
    # 如果用到屬性裡的值,使用函式rvest::html_att(),如rvest::html_att("alt")
    # rvest::html_nodes(webPage, xpath = '//div[@class="lister-item-image float-left"]/a/img') %>% rvest::html_attr("alt")
    
    # 上映年份
    year <- rvest::html_nodes(webPage, xpath = '//span[@class="lister-item-year text-muted unbold"]/text()')

    # ======= 方法2 使用css選擇擇器 =====
    # 電影排序
    movieRank <- rvest::html_nodes(webPage, css = "span.lister-item-year.text-muted.unbold") %>% rvest::html_text()

R中使用rvest爬取資料小試

總結R中使用 xpath 和 css selectors 獲取標籤內容(xpath功能強大，而CSS選擇器通常語法比較簡潔，執行速度更快些)

1.安裝rvest、xml2包

2.載入rvest、xml2包

3.使用兩個工具包爬去資料

R中使用rvest爬取資料小試

R語音 rvest爬取中國天氣網所有城市未來七天天氣資料並寫入oracle資料庫

aiohttp非同步爬取資料傳送請求--小試

Python3.6實現scrapy框架爬取資料並將資料插入MySQL與存入文件中

java中從高德地圖爬取資料

將豆瓣排名前250爬取資料通過sqlite3存入資料庫

selenium+python爬取資料跳轉網頁

python：爬蟲爬取資料的處理之Json字串的處理（2）

python ：通過爬蟲爬取資料（1）

爬取資料省市縣鎮村

vue專案中jsonp抓取資料實現方式

用appium爬取資料python3實現

Python使用xpath爬取資料返回空列表解決方案積累

Selenium+phanmJs 操作瀏覽器爬取資料

scrapy 爬取資料遞歸回掉出錯錯誤日誌【Filtered offsite request to】

如何使用Python爬取資料？看完這篇文章你就懂了！

【爬蟲例項1】python3下使用beautifulsoup爬取資料並存儲txt檔案

spark streaming 中 direct 直連方式從kafka中怎麼拉取資料

基於scrapy中---全站爬取數據----CrawlSpider的使用

python爬蟲定時增量爬取資料

R中使用rvest爬取資料小試

總結R中使用 xpath 和 css selectors 獲取標籤內容(xpath功能強大，而CSS選擇器通常語法比較簡潔，執行速度更快些)

1.安裝rvest、xml2包

2.載入rvest、xml2包

3.使用兩個工具包爬去資料

相關推薦