1. 程式人生 > >java 實現百度熊掌號歷史資源記錄提交

java 實現百度熊掌號歷史資源記錄提交

最近在做一個需求,需要將大量的歷史記錄url提交給百度熊掌號資源搜尋平臺,雖然熊賬號給提供了手動提交的工具,但是這種方式的提交費時費力,尤其是在有很多的url需要提交時使用這個方式提交很明顯效率低下,所以可以採用提供api提交的方式,


一   百度熊掌號賬號獲取(這個可以自己百度申請賬號)

二   看上圖,這是官方提供的api說明(這個需要登入自己的賬號才可以看到),實際上說到這裡基本上已經知道怎麼批量提交資料,但是這裡有幾點需要說明一下:

    1)批量提交時url中的type需要設定為batch,進行批量提交

    2)單次提交時上限是2000個,否則會返回超出提交上限

三 程式碼實現

import org.apache.commons.io.FileUtils;
import 
org.apache.commons.lang3.StringUtils; import org.apache.http.HttpEntity; import org.apache.http.HttpResponse; import org.apache.http.StatusLine; import org.apache.http.client.HttpResponseException; import org.apache.http.client.ResponseHandler; import org.apache.http.client.methods.CloseableHttpResponse; import
org.apache.http.client.methods.HttpPost; import org.apache.http.entity.StringEntity; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClientBuilder; import org.apache.http.util.EntityUtils;

private static CloseableHttpClient client;
client = HttpClientBuilder.create
().disableAutomaticRetries().build(); //建立客戶端

private void urlsPush(List<String> urlList, String type) {
    if (CollectionUtils.isEmpty(urlList)) {
        return;
    }
     //進行分組提交
    int times = urlList.size() % 2000 == 0 ? (urlList.size() / 2000) : (urlList.size() / 2000 + 1);
    for (int i = 0; i < times; i++) {
        int end = (i + 1) * 2000;
        if (end >= urlList.size()) {
            end = urlList.size();
        }
        List<String> subList = urlList.subList(i * 2000, end);
        StringBuilder sb = new StringBuilder();
        subList.stream().forEach(url -> {
            sb.append(url);
            sb.append("\r\n");
        });
        String params = sb.toString();
        // saveAsFile(params, type);
        // 判斷是提交還是儲存到檔案中
if (maxPushSize <= 2000) {
            saveAsFile(params, type);
            continue;
        }
        HttpPost request = new HttpPost(URL);
        request.setHeader("content-type", "text/plain");
        HttpEntity entity = new StringEntity(params, Charset.defaultCharset());
        request.setEntity(entity);
        CloseableHttpResponse response = null;
        try {
            logger.info("正在推送資料,本次推送{},推送內容:{}", subList.size(), type);
            long singlePushStart = System.currentTimeMillis();
            response = client.execute(request);
            logger.info("單次推送完成,本次共計用時{}ms", System.currentTimeMillis() - singlePushStart);
        } catch (IOException e) {
            e.printStackTrace();
            logger.info("推送資料異常");
            saveAsFile(params, type);
            continue;
        }
        StatusLine statusLine = response.getStatusLine();
        HttpEntity responseEntity = response.getEntity();
        if (statusLine.getStatusCode() != 200 || responseEntity == null) {
            logger.info("資料獲取異常");
            saveAsFile(params, type);
            continue;
        }

        String respStr = "";
        try {
            respStr = EntityUtils.toString(responseEntity);
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        if (StringUtils.isNotBlank(respStr)) {
            try {
                PushUrlsResponse result = null;
                result = JSONObject.parseObject(respStr, PushUrlsResponse.class);
                this.maxPushSize = result.getRemain_batch();
                this.successSize += result.getSuccess_batch();
            } catch (Exception e) {
                logger.info("解析返回內容出現問題,返回內容{}", respStr);
                saveAsFile(params, type);
            }
        }

    }

}
/**
 * url提交響應結果
*/
private static class PushUrlsResponse {
    /**
     * 成功提交條數
*/
private int success_batch;
    /**
     * 剩餘可提交數
*/
private int remain_batch;

    public int getSuccess_batch() {
        return success_batch;
    }

    public void setSuccess_batch(int success_batch) {
        this.success_batch = success_batch;
    }

    public int getRemain_batch() {
        return remain_batch;
    }

    public void setRemain_batch(int remain_batch) {
        this.remain_batch = remain_batch;
    }
}

我這裡對未成功提交的資料,寫到了檔案中進行儲存,所以會有儲存的方法

saveAsFile

   引數param是提交的url內容,type為檔案中內容的型別(我這裡url內容分類比較多,所以需要一個type來標記檔案中內容的型別)