1. 程式人生 > >【httpclient編寫爬蟲】post提交json資料和普通鍵值

【httpclient編寫爬蟲】post提交json資料和普通鍵值

寫在開頭

在開發爬蟲的過程中,難免碰到post提交的問題。
本文比較了兩種資料提交方式,並且使用httpclient模擬網站post提交兩種資料。

我見過的post提交方式有兩種:

  1. 普通的鍵值對提交方式;
  2. 提交json資料。

我所使用的httpclient版本

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.2</version
>
</dependency>

普通鍵值對的提交方式

CloseableHttpClient httpclient = HttpClients.createDefault();

HttpPost httpPost = new HttpPost("http://targethost/login");
List<NameValuePair> nvps = new ArrayList<NameValuePair>();
nvps.add(new BasicNameValuePair("username", "vip"));
nvps.add(new BasicNameValuePair("password"
, "secret")); httpPost.setEntity(new UrlEncodedFormEntity(nvps)); CloseableHttpResponse response2 = httpclient.execute(httpPost); try { System.out.println(response2.getStatusLine()); HttpEntity entity2 = response2.getEntity(); // do something useful with the response body // and ensure it is fully consumed
EntityUtils.consume(entity2); } finally { response2.close(); }

JSON資料提交方式

要提交的資料

{
    "username" : "vip",
    "password" : "secret"
}

程式碼

import org.apache.http.Consts;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.IOException;

/**
 * Created by CarlZhang on 2017/1/1.
 */
public class PostJsonTest {
    public static void main(String[] args) {
        CloseableHttpClient httpclient = HttpClients.createDefault();
        try {

            HttpPost httpPost = new HttpPost("http://targethost/login");

            //json資料{"username":"vip","password":"secret"}
            String jsonStr  = "{\"username\":\"vip\",\"password\":\"secret\"}";

            StringEntity se = new StringEntity(jsonStr, Consts.UTF_8);
            se.setContentEncoding("UTF-8");
            se.setContentType("application/json");

            httpPost.setEntity(se);
            CloseableHttpResponse response2 = httpclient.execute(httpPost);

            try {
                System.out.println(response2.getStatusLine());
                HttpEntity entity2 = response2.getEntity();
                // do something useful with the response body
                // and ensure it is fully consumed
                //EntityUtils.consume(entity2);
                String res = EntityUtils.toString(entity2);
                System.out.println(res);
            } finally {
                response2.close();
            }

        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            try {
                httpclient.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

例項-JSON提交

如下是我在某個網站點擊發帖,然後在chrome (按住F12鍵)開啟的dubug工具,可以看到我提交的post請求。其中Form Data就是使用post提交的json資料。
這裡寫圖片描述

後端怎麼拿到這些資料的呢?
該網站使用的開源庫latke中的Requests類
可以看到,它是通過Reader物件去讀入流資料的。

 /**
     * Gets the request json object with the specified request.
     *
     * @param request the specified request
     * @param response the specified response, sets its content type with "application/json"
     * @return a json object
     * @throws ServletException servlet exception
     * @throws IOException io exception
     */
    public static JSONObject parseRequestJSONObject(final HttpServletRequest request, final HttpServletResponse response)
        throws ServletException, IOException {
        response.setContentType("application/json");

        final StringBuilder sb = new StringBuilder();
        BufferedReader reader;

        final String errMsg = "Can not parse request[requestURI=" + request.getRequestURI() + ", method=" + request.getMethod()
            + "], returns an empty json object";

        try {
            try {
                reader = request.getReader();
            } catch (final IllegalStateException illegalStateException) {
                reader = new BufferedReader(new InputStreamReader(request.getInputStream()));
            }

            String line = reader.readLine();

            while (null != line) {
                sb.append(line);
                line = reader.readLine();
            }
            reader.close();

            String tmp = sb.toString();

            if (Strings.isEmptyOrNull(tmp)) {
                tmp = "{}";
            }

            return new JSONObject(tmp);
        } catch (final Exception ex) {
            LOGGER.log(Level.ERROR, errMsg, ex);

            return new JSONObject();
        }
    }

另外,前端js程式碼是通過jquery的類庫去提交的json資料
這裡寫圖片描述

例項-普通post提交

這裡寫圖片描述