hiveserver 佔用記憶體過大的問題

阿新 • • 發佈：2019-02-08

今天為了求解hiveserver佔用記憶體過大的問題，特地加了hive在apache的郵件列表，討論半天。特別說的是裡面的人確實很熱情啊，外國人做事確實很認真，討論帖發的時候都狠詳細。

粘出一些記錄：

Did you update your JDK in last time? A java-dev told me that could be
a  issue in JDK _26
(https://forums.oracle.com/forums/thread.jspa?threadID=2309872), some
devs report a memory decrease when they use GC - flags. I'm quite not
sure, sounds for me to far away.

The stacks have a lot waitings, but I see nothing special.

- Alex

2011/12/12 王鋒 < 
[email protected]>:
>
> The hive log:
>
> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121840_767713480.txt
> 8159.581: [GC [PSYoungGen: 1927208K->688K(2187648K)]
> 9102425K->7176256K(9867648K), 0.0765670 secs] [Times: user=0.36 sys=0.00,
> real=0.08 secs]
> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121841_451939518.txt
> 8219.455: [GC [PSYoungGen: 1823477K->608K(2106752K)]
> 8999046K->7176707K(9786752K), 0.0719450 secs] [Times: user=0.66 sys=0.01,
> real=0.07 secs]
> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121842_1930999319.txt
>
> Now we have 3 hiveservers and I set the concurrent job num to 4,but the Mem
> still be so large .I'm  mad, God
>
> have other suggestions ?
>
> 在 2011-12-12 17:59:52，"alo alt" < 
[email protected]
>> 寫道：
>>When you start a high-load hive query can you watch the stack-traces?
>>Its possible over the webinterface:
>>http://jobtracker:50030/stacks
>>
>>- Alex
>>
>>
>>2011/12/12 王鋒 <[email protected]>
>>>
>>> hiveserver will throw oom after several hours .
>>>
>>>
>>> At 2011-12-12 17:39:21,"alo alt" < 
[email protected]> wrote:
>>>
>>> what happen when you set xmx=2048m or similar? Did that have any negative effects for running queries?
>>>
>>> 2011/12/12 王鋒 <[email protected]>
>>>>
>>>> I have modify hive jvm args.
>>>>  the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .
>>>>
>>>> but the memory  used by hiveserver  is still large.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> At 2011-12-12 16:20:54,"Aaron Sun" <[email protected]> wrote:
>>>>
>>>> Not from the running jobs, what I am saying is the heap size of the Hadoop really depends on the number of files, directories on the HDFS. Remove old files periodically or merge small files would bring in some performance boost.
>>>>
>>>> On the Hive end, the memory consumed also depends on the queries that are executed. Monitor the reducers of the Hadoop job, and my experiences are that reduce part could be the bottleneck here.
>>>>
>>>> It's totally okay to host multiple Hive servers on one machine.
>>>>
>>>> 2011/12/12 王鋒 <[email protected]>
>>>>>
>>>>> is the files you said  the files from runned jobs  of our system? and them  can't be so much large.
>>>>>
>>>>> why is the cause of namenode.  what are hiveserver doing   when it use so large memory?
>>>>>
>>>>> how  do you use hive? our method using hiveserver is correct?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> 在 2011-12-12 14:27:09，"Aaron Sun" <[email protected]> 寫道：
>>>>>
>>>>> Not sure if this is because of the number of files, since the namenode would track each of the file and directory, and blocks.
>>>>> See this one. http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>>
>>>>> Please correct me if I am wrong, because this seems to be more like a hdfs problem which is actually irrelevant to Hive.
>>>>>
>>>>> Thanks
>>>>> Aaron
>>>>>
>>>>> 2011/12/11 王鋒 <[email protected]>
>>>>>>
>>>>>>
>>>>>> I want to know why the hiveserver use so large memory,and where the memory has been used ?
>>>>>>
>>>>>> 在 2011-12-12 10:02:44，"王鋒" <[email protected]> 寫道：
>>>>>>
>>>>>>
>>>>>> The namenode summary:
>>>>>>
>>>>>>
>>>>>>
>>>>>> the mr summary
>>>>>>
>>>>>>
>>>>>> and hiveserver:
>>>>>>
>>>>>>
>>>>>> hiveserver jvm args:
>>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParall
>>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
>>>>>>
>>>>>> now we  using 3 hiveservers in the same machine.
>>>>>>
>>>>>>
>>>>>> 在 2011-12-12 09:54:29，"Aaron Sun" <[email protected]> 寫道：
>>>>>>
>>>>>> how's the data look like? and what's the size of the cluster?
>>>>>>
>>>>>> 2011/12/11 王鋒 <[email protected]>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>     I'm one of engieer of sina.com.  We have used hive ,hiveserver several months. We have our own tasks schedule system .The system can schedule tasks running with hiveserver by jdbc.
>>>>>>>
>>>>>>>     But The hiveserver use mem very large, usally  large than 10g.   we have 5min tasks which will be  running every 5 minutes.,and have hourly tasks .total num of tasks  is 40. And we start 3 hiveserver in one linux server,and be cycle connected .
>>>>>>>
>>>>>>>     so why Memory of  hiveserver  using so large and how we do or some suggestion from you ?
>>>>>>>
>>>>>>> Thanks and Best Regards!
>>>>>>>
>>>>>>> Royce Wang

最上面 Alex發現一篇文章

https://forums.oracle.com/forums/thread.jspa?threadID=2309872說是 jdk_1.0.26存在洩露的風險，我們正在使用也正是同一個版本，看這個url裡文章說的也是誰也不能確認，而oracle方自然說不由其負責。

I tried with java6u29 and java7 and they work great. Actually on the production server we are running for almost 4 days with java7 and it's stable, no crash, no slowdown, no restart in this period, and with less maximum memory. If it's going to last for a week then I trust it will go on fine.

最後是有人用java6u29 和java7 執行 穩定。特別是java7.明天嘗試在hiveserver伺服器換用java7試試。

append。。。。

今天改用jdk 7測試情況基本一致，看來問題並不是 jvm問題。

使用jmap -heap 發現 hiveserver 新生代並沒有去按照ratio設定的那樣，最大容量還是預設的800m，這個對資料分析來說太小了，使用xmn配置新生代，並配置最大新生代大小，而且將gc機制改為cms，目前記憶體佔用穩定在 2.3g左右。

最後的引數：

export HADOOP_OPTS="$HADOOP_OPTS -Xms5000m -Xmn4000m -XX:MaxNewSize=4000m -Xss128k -XX:MaxHeapFreeRatio=80 -XX:MinHeapFreeRatio=40 -XX:+UseParNewGC -XX:+UseConcMarkSw
eepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:-UseGCOverheadLimit -XX:MaxTenuringThreshold=8 -XX:P
ermSize=800M -XX:MaxPermSize=800M -XX:GCTimeRatio=19 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"

hiveserver 佔用記憶體過大的問題

hiveserver 佔用記憶體過大的問題

win7--svchost佔用記憶體過大

C#中執行緒佔用記憶體過大解決方法

關於mongodb佔用記憶體過大的問題

GitLab效能調優-佔用記憶體過大的問題

sql server 佔用實體記憶體過大的問題

linux xorg佔用視訊記憶體過大解決

w3wp.exe程序佔用記憶體過高解決方法

php-fpm佔用記憶體過高分析

記一次Mysql佔用記憶體過高的優化過程

linux mysql 5.6 記憶體過大的問題

【筆記】ios 記憶體大小的限制 (因佔用記憶體太大而crash）

python解決mongo日誌佔用儲存過大的問題

JProfiler工具開啟dump檔案,分析jar包程式記憶體過大後cpu100%

VS2010工程佔用空間過大的原因和解決方法

zabbix自動發現佔用記憶體最大top10程序並監控資源

C# Winform應用程式佔用記憶體較大解決方法整理（轉）

Scrapy爬蟲-大資料爬取時記憶體過大的解決辦法(轉)

visual code rg.exe或者git for window佔用記憶體越來越大

IOS獲取圖片方法，避免記憶體過大閃退

hiveserver 佔用記憶體過大的問題

相關推薦