1. 程式人生 > >如何利用matlab爬蟲抓資料

如何利用matlab爬蟲抓資料

如何利用matlab爬蟲抓包

很多同學可能聽說用python進行網上爬蟲,今天給大家帶來的是利用matlab爬蟲。不需任何基礎,意在給大家一種自己動手抓包的體驗。

  1. 開啟你安裝好的matlab。
  2. 新建一個指令碼檔案,將以下程式碼複製黏貼進去。
clc;
clear;
warning off;
year=2015;    
for season = 1:4   
fprintf('抓取%d年%d季度的資料中...', year, season)
[sourcefile, status] = urlread(sprintf('http://vip.stock.finance.sina.com.cn/corp/go.php/vMS_MarketHistory/stockid/000001/type/S.phtml?year=%d&season=%d'
, year)); if ~status%判斷資料是否全部讀取成功 error('出問題了哦,請檢查\n') end expr1 = '\s+(\d\d\d\d-\d\d-\d\d)\s*'; %要提取的模式,()中為要提取的內容 [datefile, date_tokens]= regexp(sourcefile, expr1, 'match', 'tokens'); %match返回整個匹配型別,token返回()標記的位置,都為元胞型別 date = cell(size(date_tokens));%建立一個等大的元胞陣列 for idx = 1:length(date_tokens) date{idx}
= date_tokens{idx}{1}; %將日期寫入 end expr2 = '<div align="center">(\d*\.?\d*)</div>'; [datafile, data_tokens] = regexp(sourcefile, expr2, 'match', 'tokens'); %從原始檔中獲取目標資料 data = zeros(size(data_tokens));%產生和資料相同長度的0 for idx = 1:length(data_tokens) data(idx) = str2double(data_tokens{idx}
{1}); %轉變資料型別後存入data中 end data = reshape(data, 6, length(data)/6 )'; %重排,根據原始碼的顯示,將不同定義的資料排成六列 items={'日期' '開盤價' '最高價' '收盤價' '最低價' '交易量' '交易金額'}; sheet = sprintf('%d季度', season); %工作表名稱 xlswrite('D:/data', items, sheet) xlswrite('D:/data', date' , sheet,'A2'); %在第一列寫入日期 range = sprintf('B2:%s%d',char(double('B')+size(data,2)-1), size(data,1)+1); %從原始檔中獲取的目標資料的放置範圍 xlswrite('D:/data', data, sheet, range); fprintf('完成!\n') end fprintf('全部完成!資料儲存在D盤的data表格中,請注意檢視!\n')

3.靜靜地等待,提示完成後,開啟D盤的data表格,檢視你的成果吧。

有好奇心的同學,可能還想探索一下具體的工作機制,這可能就需要下苦工了,有以下幾點建議

  • 學著看懂程式碼,不懂的地方上網搜尋。自己摸索才是最好的學習方式,小編就不詳述了。
  • 看懂程式碼並學會正則表示式後,開啟網頁原始碼。

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<title>上證綜合指數(000001)_歷史交易_新浪網</title>
<meta name="Keywords" content="上證綜合指數,000001,行情" />
<meta name="Description" content="上證綜合指數的實時行情" />
<link media="all" rel="stylesheet" href="/corp/view/css/style.css" />
<link media="all" rel="stylesheet" href="/corp/view/css/newstyle.css" />
<link media="all" rel="stylesheet" href="/corp/view/css/tables.css" />
<link media="all" rel="stylesheet" href="/corp/view/css/style4.css" />
<style type="text/css">
body,ul,ol,li,p,h1,h2,h3,h4,h5,h6,form,fieldset,table,td,img,div{margin:0;padding:0;border:0;}
body,ul,ol,li,p,form,fieldset,table,td{font-family:"宋體";}
body{background:#fff;color:#000;}
td,p,li,select,input,textarea,div{font-size:12px;}

ul{list-style-type:none;}
select,input{vertical-align:middle; padding:0; margin:0;}

.f14 {font-size:14px;}
.lh20 {line-height:20px;}
.lh23{line-height:23px;}
.b1{border:1px #fcc solid;}

a{text-decoration: underline;color:#009}
a:visited{color:#333333;}
a:hover{color:#f00;}

.f14links{line-height:23px;}
.f14links,.f14links a{font-size:14px;color:#009;}
.f14links a:hover{color:#F00;}
.f14links li{padding-left:13px;background:url(http://image2.sina.com.cn/dy/legal/2006index/news_law_hz_012.gif) no-repeat 3px 45%;}

.clearit{clear:both;font-size:0;line-height:0;height:0;}
.STYLE2 {font-size: 14px; font-weight: bold; }

/*杜邦分析用到的css begin*/
.bottom_line {border-bottom:1px solid #999999}
.f14 {font-size:14px}
.f12 {font-size:12px}

.l15{line-height:150%}
.l13{line-height:130%}
.lh19{line-height:19px;}
/*杜邦分析用到的css end*/
</style>
<!--[if IE]>
<link media="all" rel="stylesheet" href="http://www.sinaimg.cn/cj/realstock/css/ie.css" />
<![endif]-->
<script language="javascript" type="text/javascript">
<!--//--><![CDATA[//><!--
var fullcode="sh000001";
var chart_img_alt = "上證綜合指數 000001 行情圖";

/* comment */
var cmnt_channel    = "gg";
var cmnt_newsid     = "sh-000001";
var cmnt_group      = 1;

var detailcache = new Array();
//--><!]]>
</script>
<script type="text/javascript" src="/corp/view/js/all.js"></script>
<script type="text/javascript" src="/corp/view/js/tables.js"></script>

<script type="text/javascript" src="http://finance.sina.com.cn/iframe/hot_stock_list.js"></script>
<script type="text/javascript" src="http://hq.sinajs.cn/list=sh000001,s_sh000001,s_sh000300,s_sz399001,s_sz399106,s_sz395099"></script>
<script type="text/javascript" src="http://image2.sina.com.cn/home/sinaflash.js"></script>

<script type="text/javascript" src="/corp/view/js/corp_fenshi_zs.js"></script> 

</head>
<body>

<div id="wrap">
<!-- 標準二級導航_財經 begin -->
<style type="text/css">
.secondaryHeader{height:33px;overflow:hidden;background:url(http://i2.sinaimg.cn/dy/images/header/2008/standardl2nav_bg.gif) repeat-x #fff;color:#000;font-size:12px;font-weight:100;}
.secondaryHeader a,.secondaryHeader a:visited{color:#000;text-decoration:none;}
.secondaryHeader a:hover,.secondaryHeader a:active{color:#c00;text-decoration:underline;}
.sHBorder{border:1px #e3e3e3 solid;padding:0 10px 0 12px;overflow:hidden;zoom:1;}
.sHLogo{float:left;height:31px;line-height:31px;overflow:hidden;}
.sHLogo span,.sHLogo span a,.sHLogo span a:link,.sHLogo span a:visited,.sHLogo span a:hover{display:block;*float:left;display:table-cell;vertical-align:middle;*display:block;*font-size:27px;*font-family:Arial;height:31px;}
.sHLogo span,.sHLogo span a img,.sHLogo span a:link img,.sHLogo span a:visited img,.sHLogo span a:hover img{vertical-align:middle;}
.sHLinks{float:right;line-height:31px;}
#level2headerborder{background:#fff; height:5px; overflow:hidden; clear:both; width:950px;}
</style>
<div id="level2headerborder"></div>
<div class="secondaryHeader">
    <div class="sHBorder">
        <div class="sHLogo"><span><a href="http://www.sina.com.cn/"><img src="http://i1.sinaimg.cn/dy/images/header/2009/standardl2nav_sina_new.gif" alt="新浪網" /></a><a href="http://finance.sina.com.cn/"><img src="http://i1.sinaimg.cn/dy/images/header/2009/standardl2nav_finance.gif" alt="新浪財經" /></a></span></div>
        <div class="sHLinks"><a href="http://finance.sina.com.cn/">財經首頁</a>&nbsp;|&nbsp;<a href="http://www.sina.com.cn/">新浪首頁</a>&nbsp;|&nbsp;<a href="http://news.sina.com.cn/guide/">新浪導航</a></div>
    </div>
</div>
<div id="level2headerborder"></div>
<!-- 標準二級導航_財經 end -->
  <!-- banner begin -->
  <div style="float:left; width:950px;">
    <!-- 頂部廣告位 begin -->
    <div style="float:left; width:750px; height:90px;">
        <iframe marginheight="0" marginwidth="0" src="http://finance.sina.com.cn/iframe/ad/PDPS000000004094.html" frameborder="0" height="90" scrolling="no" width="750"></iframe><!--<script type="text/javascript" src="http://finance.sina.com.cn/pdps/js/PDPS000000004094.js"></script> --> 
    </div>
    <!-- 頂部廣告位 end -->
    <div style="float:right;width:188px; height:88px; border:1px solid #DEDEDE;">
        <ul>
            <li style="background:url(http://www.sinaimg.cn/bb/article/con_ws_001.gif);line-height:15px;text-align:center;color:#F00">熱點推薦</li>

            <li style="line-height:20px; margin-top:5px;">·<a href="http://vip.stock.finance.sina.com.cn/portfolio/main.php" style="color:#F00">自選股-輕鬆管理您的千隻股票</a></li>

            <li style="line-height:20px;">·<a href="http://finance.sina.com.cn/money/mall.shtml">金融e路通-理財投資更輕鬆</a></li>
            <li style="line-height:20px;">·<a href="http://biz.finance.sina.com.cn/hq/">行情中心-通往財富之門</a></li>
        </ul>
    </div>
    <div style="clear:both"></div>

  </div>

  <!-- banner end -->
  <div class="HSpace-1-5"></div>
  <!-- 導航 begin -->
  <div class="nav">
    <ul>
      <li class="navRedLi"><a href="http://finance.sina.com.cn/" target="_blank">財經首頁</a></li>
      <li id="nav01"><a href="http://finance.sina.com.cn/stock/index.shtml" target="_blank">股票</a></li>
      <li id="nav02"><a href="http://finance.sina.com.cn/fund/index.shtml" target="_blank">基金</a></li>
      <li id="nav03"><a href="http://finance.sina.com.cn/stock/roll.shtml" target="_blank">滾動</a></li>
      <li id="nav04"><a href="http://vip.stock.finance.sina.com.cn/corp/view/vCB_BulletinGather.php" target="_blank">公告</a></li>
      <li id="nav05"><a href="http://finance.sina.com.cn/column/jsy.html" target="_blank">大盤</a></li>
      <li id="nav06"><a href="http://finance.sina.com.cn/column/ggdp.html" target="_blank">個股</a></li>
      <li id="nav07"><a href="http://finance.sina.com.cn/stock/newstock/index.shtml" target="_blank">新股</a></li>
      <li id="nav08"><a href="http://finance.sina.com.cn/stock/warrant/index.shtml" target="_blank">權證</a></li>
      <li id="nav09"><a href="http://finance.sina.com.cn/stock/reaserchlist.shtml" target="_blank">報告</a></li>
      <li id="nav10"><a href="http://finance.sina.com.cn/money/globalindex/index.shtml" target="_blank">環球市場</a></li>   
      <li id="nav11"><a href="http://blog.sina.com.cn/lm/finance/index.html" target="_blank">部落格</a></li>
      <li id="nav12"><a href="http://finance.sina.com.cn/bar/" target="_blank">股票吧</a></li>
      <li id="nav13"><a href="http://finance.sina.com.cn/stock/hkstock/index.shtml" target="_blank">港股</a></li>
      <li id="nav14"><a href="http://finance.sina.com.cn/stock/usstock/index.shtml" target="_blank">美股</a></li>      
      <li id="nav15"><a href="http://biz.finance.sina.com.cn/hq/" target="_blank">行情中心</a></li>
      <li id="nav16"><a href="http://vip.stock.finance.sina.com.cn/portfolio/main.php" target="_blank">自選股</a></li>
   </ul>
  </div>
  <!-- 導航 end -->
  <!-- 導航下 begin -->
  <div class="navbtm">
    <div class="navbtmblk1"><span id="idxsh000001"><a href="http://finance.sina.com.cn/realstock/company/sh000001/nc.shtml" target="_blank">上證指數</a>: 0000.00 0.00 00.00億元</span> | <span id="idxsz399001"><a href="http://finance.sina.com.cn/realstock/company/sz399001/nc.shtml" target="_blank">深圳成指</a>: 0000.00 0.00 00.00億元</span> | <span id="idxsh000300"><a href="http://finance.sina.com.cn/realstock/company/sh000300/nc.shtml" target="_blank">滬深300</a>: 0000.00 0.00 00.00億元</span></div>

    <div class="navbtmmaquee">
      <script type="text/javascript" src="http://finance.sina.com.cn/286/20061129/3.js"></script>
      <script type="text/javascript" language="javascript">
        <!--//--><![CDATA[//><!--
        if(!document.layers) {
            with (document.getElementsByTagName("marquee")[0]) {
                scrollDelay = 50;
                scrollAmount = 2;
                onmouseout = function () {
                    this.scrollDelay = 50;
                };
            }
        }
        //--><!]]>
      </script>
    </div>
  </div>
  <!-- 導航下 end -->
  <div class="HSpace-1-6"></div>

  <div id="main">

    <!-- 左側 begin -->
    <div id="left">
      <!-- 最近訪問股|我的自選股 begin -->
      <div class="LBlk01">
        <!-- 標籤 begin -->
        <ul class="LTab01">
          <li class="Menu01On" id="m01-0">最近訪問股</li>

          <li class="Menu01Off" id="m01-1">我的自選股</li>

        </ul>
        <!-- 標籤 end -->
        <!-- 內容 begin -->
        <div id="con01-0"></div>
        <div id="con01-1" style="display:none;"></div>
        <!-- 內容 end -->
      </div>

      <!-- 最近訪問股|我的自選股 end -->
      <div class="HSpace-1-10"></div>

      <!-- 選單 begin -->
      <div class="Menu-Ti" id="navlf00"><img src="http://www.sinaimg.cn/cj/realstock/image2/finance_in_ws_010.gif" alt="" id="tImg0"/><span class="capname">每日必讀</span></div>
      <div class="Menu-Con" id="item0" style="display:block;">
        <table cellspacing="0">
          <tr>
            <td>·<a href="http://stock.finance.sina.com.cn/" target="_self">股市必察</a></td>

            <td>·<a href="http://biz.finance.sina.com.cn/stock/company/notice.php?kind=daily" target="_self" class="incolor">每日提示</a></td>

          </tr>
          <tr>
            <td>·<a href="/corp/go.php/vRPD_QuickView/.phtml" target="_self">公司快報</a></td>
            <td>·<a href="/corp/go.php/vRPD_NewStockIssue/page/1.phtml" target="_self">新股上市</a></td>

          </tr>
          <tr>
            <td>·<a href="http://vip.stock.finance.sina.com.cn/q/go.php/vInvestConsult/kind/lhb/index.phtml" target="_self">龍虎榜</a></td>

            <td>·<a href="http://vip.stock.finance.sina.com.cn/q/go.php/vIR_EndRise/index.phtml" target="_self" class="incolor">每日熱股</a></td>
          </tr>
          <tr>

            <td colspan='2'>·<a href="http://finance.sina.com.cn/realstock/income_statement/2012-06-30/issued_pdate_de_1.html" target="_self" class="incolor">中報速遞</a></td>
          </tr>
        </table>
      </div>
      <!--<div class="HSpace-1-10"></div> -->
      <div class="Menu-Ti" id="navlf01"><img src="http://www.sinaimg.cn/cj/realstock/image2/finance_in_ws_010.gif" alt="" id="tImg1"/><span class="capname">指數資料</span></div>

      <div class="Menu-Con" id="item1" style="display:block;">

        <table cellspacing="0">
          <tr>
            <td>·<a href="/corp/go.php/vII_BasicInfo/indexid/000001.phtml" target="_self">基本屬性</a></td>
            <td>·<a href="/corp/go.php/vII_NewestComponent/indexid/000001.phtml" target="_self">最新成分</a></td>
          </tr>

          <tr>
            <td>·<a href="/corp/go.php/vII_HistoryComponent/indexid/000001.phtml" target="_self">歷史成分</a></td>

            <