1. 程式人生 > >yarn下的mapreduce記憶體問題

yarn下的mapreduce記憶體問題

參考

前因

使用Hadoop的streming.jar遇到問題

問題1:

18/10/13 19:40:56 INFO input.FileInputFormat: Total input files to process : 701930
18/10/13 20:04:22 INFO retry.RetryInvocationHandler: java.io.IOException: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: GC overhead limit exceeded, while invoking ClientNamenodeProtocolTranslatorPB.getBlockLocations over 2.master.mz/192.168.10.224:8020. Trying to failover immediately.

18/10/13 20:05:04 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/admonitor/.staging/job_1539157945372_30633
Exception in
thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.substring(String.java:1933) at java.util.Formatter.parse(Formatter.java:2567) at java.util.Formatter.format(Formatter.java:2501) at java.util.Formatter.format(Formatter.java:2455)
at java.lang.String.format(String.java:2940) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:471) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)

outofmemory,GC,讀取大量小檔案getBlockLocations時出錯 新增引數HADOOP_CLIENT_OPTS,該選項增加的引數,將會作用於多個命令,如fs, dfs, fsck, distcp等

HADOOP_CLIENT_OPTS=
"-Xmx8192M" hadoop jar $stream_jar ...

問題2:

Container [pid=100823,containerID=container_e39_1539157945372_36692_01_000527] is running 108359680B beyond the 'PHYSICAL' memory limit. Current usage: 1.1 GB of 1 GB physical memory used; 3.1 GB of 2.1 GB virtual memory used. Killing container.

實體記憶體和虛擬記憶體不足 新增記憶體,需要注意的是需要判斷是map還是reduce過程出現的記憶體不足

  -Dmapreduce.map.memory.mb=8192 \
  -Dmapreduce.map.java.opts=-Xmx7168M \
  -Dmapreduce.reduce.memory.mb=4096 \
  -Dmapreduce.reduce.java.opts=-Xmx3072M \

關於yarn下的記憶體引數配置

引數描述

name 預設值 描述
yarn.nodemanager.resource.memory-mb 8GB Amount of physical memory, in MB, that can be allocated for containers. If set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated(in case of Windows and Linux). In other cases, the default is 8192MB.
yarn.nodemanager.vmem-pmem-ratio 2.1 虛擬記憶體率,是佔task所用記憶體的百分比,預設值為2.1倍。
yarn.scheduler.minimum-allocation-mb 1G 單個container可申請的最小與最大記憶體
yarn.scheduler.maximum-allocation-mb 8G
mapreduce.map.memory.mb 設定container大小
mapreduce.reduce.memory.mb
mapreduce.map.java.opts 設定container啟動jvm相關引數,比memory.mb小,一般設定為0.75倍的memory.mb
mapreduce.reduce.java.opts