spring-retry重試與熔斷詳解

阿新 • • 發佈：2019-02-19

轉至：http://www.broadview.com.cn/article/233

本文是《億級流量》第6章超時與重試機制補充內容。

spring-retry專案實現了重試和熔斷功能，目前已用於SpringBatch、Spring Integration等專案。

RetryOperations定義了重試的API，RetryTemplate提供了模板實現，執行緒安全的，同於Spring 一貫的API風格，RetryTemplate將重試、熔斷功能封裝到模板中，提供健壯和不易出錯的API供大家使用。

首先，RetryOperations介面API：

public interface RetryOperations {
   <T, E extends Throwable>T execute(RetryCallback<T, E>retryCallback) throws E;
   <T, E extends Throwable>T execute(RetryCallback<T, E>retryCallback, RecoveryCallback<T> recoveryCallback) throws E;
   <T, E extends Throwable>T execute(RetryCallback<T, E>retryCallback, RetryState retryState) throws E, ExhaustedRetryException;
   <T, E extends Throwable>T execute(RetryCallback<T, E>retryCallback, RecoveryCallback<T> recoveryCallback, RetryStateretryState)
         throws E;
}

通過RetryCallback定義需重試的業務服務，當重試超過最大重試時間或最大重試次數後可以呼叫RecoveryCallback進行恢復，比如返回假資料或託底資料。

那什麼時候需重試？spring-retry是當丟擲相關異常後執行重試策略，定義重試策略時需要定義需重試的異常（如因遠端呼叫失敗的可以重試、而因入參校對失敗不應該重試）。只讀操作可以重試，冪等寫操作可以重試，但是非冪等寫操作不能重試，重試可能導致髒寫，或產生重複資料。

重試策略有哪些呢？spring-retry提供瞭如下重試策略。

RetryPolicy提供瞭如下策略實現：

NeverRetryPolicy：只允許呼叫RetryCallback一次，不允許重試；

AlwaysRetryPolicy：允許無限重試，直到成功，此方式邏輯不當會導致死迴圈；
SimpleRetryPolicy：固定次數重試策略，預設重試最大次數為3次，RetryTemplate預設使用的策略；
TimeoutRetryPolicy：超時時間重試策略，預設超時時間為1秒，在指定的超時時間內允許重試；
CircuitBreakerRetryPolicy：有熔斷功能的重試策略，需設定3個引數openTimeout、resetTimeout和delegate，稍後詳細介紹該策略；
CompositeRetryPolicy：組合重試策略，有兩種組合方式，樂觀組合重試策略是指只要有一個策略允許重試即可以，悲觀組合重試策略是指只要有一個策略不允許重試即可以，但不管哪種組合方式，組合中的每一個策略都會執行。

重試時的退避策略是什麼？是立即重試還是等待一段時間後重試，比如是網路錯誤，立即重試將導致立即失敗，最好等待一小段時間後重試，還要防止很多服務同時重試導致DDos。

BackOffPolicy 提供瞭如下策略實現：

NoBackOffPolicy：無退避演算法策略，即當重試時是立即重試；
FixedBackOffPolicy：固定時間的退避策略，需設定引數sleeper和backOffPeriod，sleeper指定等待策略，預設是Thread.sleep，即執行緒休眠，backOffPeriod指定休眠時間，預設1秒；
UniformRandomBackOffPolicy：隨機時間退避策略，需設定sleeper、minBackOffPeriod和maxBackOffPeriod，該策略在[minBackOffPeriod,maxBackOffPeriod之間取一個隨機休眠時間，minBackOffPeriod預設500毫秒，maxBackOffPeriod預設1500毫秒；
ExponentialBackOffPolicy：指數退避策略，需設定引數sleeper、initialInterval、maxInterval和multiplier，initialInterval指定初始休眠時間，預設100毫秒，maxInterval指定最大休眠時間，預設30秒，multiplier指定乘數，即下一次休眠時間為當前休眠時間*multiplier；
ExponentialRandomBackOffPolicy：隨機指數退避策略，引入隨機乘數，之前說過固定乘數可能會引起很多服務同時重試導致DDos，使用隨機休眠時間來避免這種情況。

到此基本的概念就講完了。接下來先看下RetryTemplate主要流程實現：

protected <T, E extends Throwable> T doExecute(RetryCallback<T, E> retryCallback,
      RecoveryCallback<T> recoveryCallback, RetryState state)
      throws E, ExhaustedRetryException {
   //重試策略
   RetryPolicy retryPolicy = this.retryPolicy;
   //退避策略
   BackOffPolicy backOffPolicy = this.backOffPolicy;
   //重試上下文，當前重試次數等都記錄在上下文中
   RetryContext context = open(retryPolicy, state);
   try {
      //攔截器模式，執行RetryListener#open
      boolean running = doOpenInterceptors(retryCallback, context);
      //判斷是否可以重試執行
      while (canRetry(retryPolicy, context) && !context.isExhaustedOnly()) {
         try {//執行RetryCallback回撥
            return retryCallback.doWithRetry(context);
         } catch (Throwable e) {//異常時，要進行下一次重試準備
            //遇到異常後，註冊該異常的失敗次數
            registerThrowable(retryPolicy, state, context, e);
            //執行RetryListener#onError
            doOnErrorInterceptors(retryCallback, context, e);
            //如果可以重試，執行退避演算法，比如休眠一小段時間後再重試
            if (canRetry(retryPolicy, context) && !context.isExhaustedOnly()) {
               backOffPolicy.backOff(backOffContext);
            }
            //state != null && state.rollbackFor(context.getLastThrowable())
            //在有狀態重試時，如果是需要執行回滾操作的異常，則立即丟擲異常
            if (shouldRethrow(retryPolicy, context, state)) {
               throw RetryTemplate.<E>wrapIfNecessary(e);
            }
         }
         //如果是有狀態重試，且有GLOBAL_STATE屬性，則立即跳出重試終止；當丟擲的異常是非需要執行回滾操作的異常時，才會執行到此處，CircuitBreakerRetryPolicy會在此跳出迴圈；
         if (state != null && context.hasAttribute(GLOBAL_STATE)) {
            break;
         }
      }
      //重試失敗後，如果有RecoveryCallback，則執行此回撥，否則丟擲異常
      return handleRetryExhausted(recoveryCallback, context, state);
   } catch (Throwable e) {
      throw RetryTemplate.<E>wrapIfNecessary(e);
   } finally {
      //清理環境
      close(retryPolicy, context, state, lastException == null || exhausted);
      //執行RetryListener#close，比如統計重試資訊
      doCloseInterceptors(retryCallback, context, lastException);
   }
}

有狀態or無狀態

無狀態重試，是在一個迴圈中執行完重試策略，即重試上下文保持在一個執行緒上下文中，在一次呼叫中進行完整的重試策略判斷。

非常簡單的情況，如遠端呼叫某個查詢方法時是最常見的無狀態重試。

RetryTemplate template = new RetryTemplate();
//重試策略：次數重試策略
RetryPolicy retryPolicy = new SimpleRetryPolicy(3);
template.setRetryPolicy(retryPolicy);
//退避策略：指數退避策略
ExponentialBackOffPolicy backOffPolicy = new ExponentialBackOffPolicy();
backOffPolicy.setInitialInterval(100);
backOffPolicy.setMaxInterval(3000);
backOffPolicy.setMultiplier(2);
backOffPolicy.setSleeper(new ThreadWaitSleeper());
template.setBackOffPolicy(backOffPolicy);

//當重試失敗後，丟擲異常
String result = template.execute(new RetryCallback<String, RuntimeException>() {
    @Override
    public String doWithRetry(RetryContext context) throws RuntimeException {
        throw new RuntimeException("timeout");
    }
});
//當重試失敗後，執行RecoveryCallback
String result = template.execute(new RetryCallback<String, RuntimeException>() {
    @Override
    public String doWithRetry(RetryContext context) throws RuntimeException {
        System.out.println("retry count:" + context.getRetryCount());
        throw new RuntimeException("timeout");
    }
}, new RecoveryCallback<String>() {
    @Override
    public String recover(RetryContext context) throws Exception {
        return "default";
    }
});

有狀態重試，有兩種情況需要使用有狀態重試，事務操作需要回滾或者熔斷器模式。

事務操作需要回滾場景時，當整個操作中丟擲的是資料庫異常DataAccessException，則不能進行重試需要回滾，而丟擲其他異常則可以進行重試，可以通過RetryState實現：

//當前狀態的名稱，當把狀態放入快取時，通過該key查詢獲取
Object key = "mykey";
//是否每次都重新生成上下文還是從快取中查詢，即全域性模式（如熔斷器策略時從快取中查詢）
boolean isForceRefresh = true;
//對DataAccessException進行回滾
BinaryExceptionClassifier rollbackClassifier =
        new BinaryExceptionClassifier(Collections.<Class<? extends Throwable>>singleton(DataAccessException.class));
RetryState state = new DefaultRetryState(key, isForceRefresh, rollbackClassifier);

String result = template.execute(new RetryCallback<String, RuntimeException>() {
    @Override
    public String doWithRetry(RetryContext context) throws RuntimeException {
        System.out.println("retry count:" + context.getRetryCount());
        throw new TypeMismatchDataAccessException("");
    }
}, new RecoveryCallback<String>() {
    @Override
    public String recover(RetryContext context) throws Exception {
        return "default";
    }
}, state);

RetryTemplate中在有狀態重試時，回滾場景時直接丟擲異常處理程式碼：

//state != null && state.rollbackFor(context.getLastThrowable())
//在有狀態重試時，如果是需要執行回滾操作的異常，則立即丟擲異常
if (shouldRethrow(retryPolicy,context, state)) {
    throw RetryTemplate.<E>wrapIfNecessary(e);
}

熔斷器場景。在有狀態重試時，且是全域性模式，不在當前迴圈中處理重試，而是全域性重試模式（不是執行緒上下文），如熔斷器策略時測試程式碼如下所示。

RetryTemplate template = new RetryTemplate();
CircuitBreakerRetryPolicy retryPolicy =
        new CircuitBreakerRetryPolicy(new SimpleRetryPolicy(3));
retryPolicy.setOpenTimeout(5000);
retryPolicy.setResetTimeout(20000);
template.setRetryPolicy(retryPolicy);

for (int i = 0; i < 10; i++) {
    try {
        Object key = "circuit";
        boolean isForceRefresh = false;
        RetryState state = new DefaultRetryState(key, isForceRefresh);
        String result = template.execute(new RetryCallback<String, RuntimeException>() {
            @Override
            public String doWithRetry(RetryContext context) throws RuntimeException {
                System.out.println("retry count:" + context.getRetryCount());
                throw new RuntimeException("timeout");
            }
        }, new RecoveryCallback<String>() {
            @Override
            public String recover(RetryContext context) throws Exception {
                return "default";
            }
        }, state);
        System.out.println(result);
    } catch (Exception e) {
        System.out.println(e);
    }
}

為什麼說是全域性模式呢？我們配置了isForceRefresh為false，則在獲取上下文時是根據key “circuit”從快取中獲取，從而拿到同一個上下文。

Object key = "circuit";
boolean isForceRefresh = false;
RetryState state = new DefaultRetryState(key,isForceRefresh);

如下RetryTemplate程式碼說明在有狀態模式下，不會在迴圈中進行重試。
if (state != null && context.hasAttribute(GLOBAL_STATE)) {
   break;
}

熔斷器策略配置程式碼，CircuitBreakerRetryPolicy需要配置三個引數：

delegate：是真正判斷是否重試的策略，當重試失敗時，則執行熔斷策略；
openTimeout：openWindow，配置熔斷器電路開啟的超時時間，當超過openTimeout之後熔斷器電路變成半開啟狀態（主要有一次重試成功，則閉合電路）；
resetTimeout：timeout，配置重置熔斷器重新閉合的超時時間。

判斷熔斷器電路是否開啟的程式碼：

public boolean isOpen() {
   long time = System.currentTimeMillis() - this.start;
   boolean retryable = this.policy.canRetry(this.context);
   if (!retryable) {//重試失敗
      //在重置熔斷器超時後，熔斷器器電路閉合，重置上下文
      if (time > this.timeout) {
         this.context = createDelegateContext(policy, getParent());
         this.start = System.currentTimeMillis();
         retryable = this.policy.canRetry(this.context);
      } else if (time < this.openWindow) {
         //當在熔斷器開啟狀態時，熔斷器電路開啟，立即熔斷
         if ((Boolean) getAttribute(CIRCUIT_OPEN) == false) {
            setAttribute(CIRCUIT_OPEN, true);
         }
         this.start = System.currentTimeMillis();
         return true;
      }
   } else {//重試成功
      //在熔斷器電路半開啟狀態時，斷路器電路閉合，重置上下文
      if (time > this.openWindow) {
         this.start = System.currentTimeMillis();
         this.context = createDelegateContext(policy, getParent());
      }
   }
   setAttribute(CIRCUIT_OPEN, !retryable);
   return !retryable;
}

從如上程式碼可看出spring-retry的熔斷策略相對簡單：

當重試失敗，且在熔斷器開啟時間視窗[0,openWindow) 內，立即熔斷；
當重試失敗，且在指定超時時間後(>timeout)，熔斷器電路重新閉合；
在熔斷器半開啟狀態[openWindow, timeout] 時，只要重試成功則重置上下文，斷路器閉合。

CircuitBreakerRetryPolicy的delegate應該配置基於次數的SimpleRetryPolicy或者基於超時的TimeoutRetryPolicy策略，且策略都是全域性模式，而非區域性模式，所以要注意次數或超時的配置合理性。

比如SimpleRetryPolicy配置為3次，openWindow=5s，timeout=20s，我們來看下CircuitBreakerRetryPolicy的極端情況。

特殊時間序列：

1s：retryable=false，重試失敗，斷路器電路處於開啟狀態，熔斷，重置start時間為當前時間；
2s：retryable=false，重試失敗，斷路器電路處於開啟狀態，熔斷，重置start時間為當前時間；
7s：retryable=true，表示可以重試，但是time=5s，time > this.openWindow判斷為false，CIRCUIT_OPEN=false，不熔斷；此時重試次數=3，等於最大重試次數了；
10s：retryable=false，因重試次數>3，time=8s，time < this.openWindow判斷為false，熔斷，且在timeout超時之前都處於熔斷狀態，這個時間段要配置好，否則熔斷的時間會太長（預設timeout=20s）；
(7s,20s]之間的所有重試：和10s的情況一樣。

如上是當重試次數正好等於最大重試次數，且time=openWindow時的特殊情況，不過實際場景這種情況幾乎不可能發生。

spring-retry的重試機制沒有像Hystrix根據失敗率閥值進行電路開啟/關閉的判斷。

如果需要區域性迴圈重試機制，需要組合多個RetryTemplate實現。

統計分析

spring-retry通過RetryListener實現攔截器模式，預設提供了StatisticsListener實現重試操作統計分析資料。

RetryTemplatetemplate = new RetryTemplate();
DefaultStatisticsRepository repository = new DefaultStatisticsRepository();
StatisticsListener listener = new StatisticsListener(repository);
template.setListeners(new RetryListener[]{listener});

for (int i = 0; i < 10; i++){
    String result = template.execute(new RetryCallback<String, RuntimeException>() {
        @Override
       public String doWithRetry(RetryContext context) throws RuntimeException {
           context.setAttribute(RetryContext.NAME,"method.key");
            return "ok";
        }
    });
}
RetryStatistics statistics = repository.findOne("method.key");
System.out.println(statistics);

此處要給操作定義一個name如“method.key”，從而查詢該操作的統計分析資料。

另外可以參考《億級流量網站架構核心技術》的《第5章降級特技》和《第6章超時與重試機制》瞭解和學習更多內容。

spring-retry重試與熔斷詳解

有狀態or無狀態

統計分析

spring-retry重試與熔斷詳解— 億級流量內容補充

spring-retry重試與熔斷詳解

自己動手實踐 spring retry 重試框架

Spring Retry重試機制

spring-retry重試支援

java retry(重試) spring retry, guava retrying 詳解

Spring異常重試框架Spring Retry

【夯實Spring Cloud】Spring Cloud中的Eureka服務註冊與發現詳解

轉發與重定向的區別詳解

spring 原始碼解讀與設計詳解：6 BeanDefinitionParserDelegate與資源解析

spring 原始碼解讀與設計詳解：5 XmlBeanDefinitionReader與Resource

Spring MVC 學習總結（二）——控制器定義與@RequestMapping詳解

Spring核心思想，IoC與DI詳解（如果還不明白，放棄java吧）

Javascript中的apply與call詳解

let與const詳解

MySQL5.6 數據庫主從（Master/Slave）同步安裝與配置詳解

Js中JSON.stringify()與JSON.parse()與eval()詳解及使用案例

AngularJS 過濾與排序詳解及實例代碼

lucene、lucene.NET詳細使用與優化詳解[轉]

spring-boot實戰【08】【轉】：Spring Boot屬性配置文件詳解

spring-retry重試與熔斷詳解

有狀態or無狀態

統計分析

相關推薦