1. 程式人生 > >Cannot enlarge string buffer containing XX bytes by XX more bytes

Cannot enlarge string buffer containing XX bytes by XX more bytes

com starting 由於 infinite nta 表示 not in seq with

在ELK的數據庫報警系統中,發現有臺機器報出了下面的錯誤:

2018-12-04 18:55:26.842 CST,"XXX","XXX",21106,"XXX",5c065c3d.5272,4,"idle",2018-12-04 18:51:41 CST,117/0,0,ERROR,54000,"out of memory","Cannot enlarge string buffer containing 0 bytes by 1342177281 more bytes.",,,,,,,"enlargeStringInfo, stringinfo.c:268",""

當看到是發生了OOM時,以為是整個數據庫實例存在了問題,線上檢查發現數據庫正常,後查閱資料了解到,pg對於一次執行的查詢語句長度是有限制的,如果長度超過了1G,則會報出上面的錯誤。

上面日誌中的1342177281 bytes是查詢的長度。

在使用copy的時候,也常會報出類似的問題,此時就要根據報錯,查看對應的行數是不是由於引號或轉義問題導致了對應行沒有恰當的結束,或者是一整行的內容大於了1G。

下面是翻閱pg9.6源碼找到的相關內容:

結合註釋,pg的源碼很容易看懂。

src/include/utils/memutils.h

/*
 * MaxAllocSize, MaxAllocHugeSize
 *      Quasi-arbitrary limits on size of allocations.
 *
 * Note:
 *      There is no guarantee that smaller allocations will succeed, but
 *      larger requests will be summarily denied.
 *
 * palloc() enforces MaxAllocSize, chosen to correspond to the limiting size
 * of varlena objects under TOAST.  See VARSIZE_4B() and related macros in
 * postgres.h.  Many datatypes assume that any allocatable size can be
 * represented in a varlena header.  This limit also permits a caller to use
 * an "int" variable for an index into or length of an allocation.  Callers
 * careful to avoid these hazards can access the higher limit with
 * MemoryContextAllocHuge().  Both limits permit code to assume that it may
 * compute twice an allocation‘s size without overflow.
 */
#define MaxAllocSize    ((Size) 0x3fffffff)     /* 1 gigabyte - 1 */

src/backend/lib/stringinfo.c

/*
* enlargeStringInfo
*
* Make sure there is enough space for ‘needed‘ more bytes
* (‘needed‘ does not include the terminating null).
*
* External callers usually need not concern themselves with this, since
* all stringinfo.c routines do it automatically.  However, if a caller
* knows that a StringInfo will eventually become X bytes large, it
* can save some palloc overhead by enlarging the buffer before starting
* to store data in it.
*
* NB: because we use repalloc() to enlarge the buffer, the string buffer
* will remain allocated in the same memory context that was current when
* initStringInfo was called, even if another context is now current.
* This is the desired and indeed critical behavior!
*/
void
enlargeStringInfo(StringInfo str, int needed)
{
   int         newlen;

   /*
    * Guard against out-of-range "needed" values.  Without this, we can get
    * an overflow or infinite loop in the following.
    */
   if (needed < 0)             /* should not happen */
       elog(ERROR, "invalid string enlargement request size: %d", needed);
   if (((Size) needed) >= (MaxAllocSize - (Size) str->len))
       ereport(ERROR,
               (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                errmsg("out of memory"),
                errdetail("Cannot enlarge string buffer containing %d bytes by %d more bytes.",
                          str->len, needed)));

   needed += str->len + 1;     /* total space required now */

   /* Because of the above test, we now have needed <= MaxAllocSize */

   if (needed <= str->maxlen)
       return;                 /* got enough space already */

   /*
    * We don‘t want to allocate just a little more space with each append;
    * for efficiency, double the buffer size each time it overflows.
    * Actually, we might need to more than double it if ‘needed‘ is big...
    */
   newlen = 2 * str->maxlen;
   while (needed > newlen)
       newlen = 2 * newlen;

   /*
    * Clamp to MaxAllocSize in case we went past it.  Note we are assuming
    * here that MaxAllocSize <= INT_MAX/2, else the above loop could
    * overflow.  We will still have newlen >= needed.
    */
   if (newlen > (int) MaxAllocSize)
       newlen = (int) MaxAllocSize;

   str->data = (char *) repalloc(str->data, newlen);

   str->maxlen = newlen;
}

src/include/lib/stringinfo.h

下面是字符串存儲用到的結構體:

/*-------------------------
 * StringInfoData holds information about an extensible string.
 *      data    is the current buffer for the string (allocated with palloc).
 *      len     is the current string length.  There is guaranteed to be
 *              a terminating ‘\0‘ at data[len], although this is not very
 *              useful when the string holds binary data rather than text.
 *      maxlen  is the allocated size in bytes of ‘data‘, i.e. the maximum
 *              string size (including the terminating ‘\0‘ char) that we can
 *              currently store in ‘data‘ without having to reallocate
 *              more space.  We must always have maxlen > len.
 *      cursor  is initialized to zero by makeStringInfo or initStringInfo,
 *              but is not otherwise touched by the stringinfo.c routines.
 *              Some routines use it to scan through a StringInfo.
 *-------------------------
 */
typedef struct StringInfoData
{
    char       *data;
    int         len;
    int         maxlen;
    int         cursor;
} StringInfoData;

typedef StringInfoData *StringInfo;

從存放字符串或二進制的結構體StringInfoData中,可以看出pg字符串類型不支持\u0000的原因,因為在pg中的字符串形式是C strings,是以\0結束的字符串,\0在ASCII中叫做NUL,Unicode編碼表示為\u0000,八進制則為0x00,如果字符串中包含\0,pg會當做字符串的結束符。

pg中的字符串不支持其中包含NULL(\0x00),這個很明顯是不同於NULL值的,NULL值pg是支持的。

在具體的使用中,可以將\u0000替換掉再導入pg數據庫。

在其他數據庫導入pg時,可以使用下面方式替換:

regexp_replace(stringWithNull, ‘\\u0000‘, ‘‘, ‘g‘)

java程序中替換:

str.replaceAll(‘\u0000‘, ‘‘)

vim替換:

s/\x00//g;

參考:

src/backend/lib/stringinfo.c

src/include/lib/stringinfo.h

src/include/utils/memutils.h

https://en.wikipedia.org/wiki/Null-terminated_string

https://stackoverflow.com/questions/1347646/postgres-error-on-insert-error-invalid-byte-sequence-for-encoding-utf8-0x0?rq=1

Cannot enlarge string buffer containing XX bytes by XX more bytes