基於CDH,部署Apache Kylin讀寫分離
一. 部署讀寫分離的契機
目前公司整體專案穩定執行在CDH5.6版本上,與其搭配的Hbase1.0.0無法正確執行Kylin,原因是Kylin只滿足Hbase1.1.x+版本。解決方案如下
1. 升級整體CDH版本,從而獲得高版本Hbase(方案風險太大)
2. 把Hbase從CDH單獨剝離出來,用原生的Hbase高版本替代(方案缺點是管理Hbase不方便,原有的應用難遷移)
3. Kylin讀寫分離(經驗證,CDH5.6的Hbase支援Kylin建CUBE,但無法讀(api不相容),所以只需在另一個叢集配置高版本的Hbase即可解決問題,方案高可行,因為既不影響現有的應用,也提高了Kylin的高可用性,一舉兩得)
二. 環境說明
從上圖可看出,Kylin支援讀寫分離,但其設計的初衷是為了分離叢集壓力,讀和寫分離,實現高速穩定可用。
當我們在前段發現建CUBE請求時,Build操作在計算叢集實現,計算CUBE之後把它load到Hbase叢集,最後轉成HFILE到Hbase,從而提供前端讀。具體到目前我的環境,可把上圖抽象為:
Kylin版本: apache-kylin-2.4.0-bin-cdh57
叢集 機器IP 機器名稱 備註
CDH5.6 10.5.8.10 see-data-pre-master-01 叢集A 主 (CDH5.6)
CDH5.6 10.5.8.17 see-data-pre-slave-1 叢集A 從
CDH5.15.0 10.5.8.12 test-data-master-1 叢集B 主 (CDH5.15.0)
CDH5.15.0 10.5.8.6 test-data-slave-1 叢集B 從
CDH5.15.0 10.5.8.7 test-data-slave-2 叢集B 從
後面我們把CDH5.6叢集簡述為叢集A,CDH5.15.0簡述為叢集B
三. 部署思路
部署Kylin的讀寫分離,顧名思義是把寫的操作指向叢集A,讀操作指向叢集B,反映到配置上,其實就是:
1. 把叢集A中的Hadooo\MR\Hive\Yarn配置複製到部署Kylin的配置目錄 2. 把叢集B中的Hbase配置檔案複製到Kylin的配置目錄 3. 配置Kylin.property檔案中對叢集A和叢集B的指標屬性
四. 部署過程
1. 首先保證兩個叢集的所有機器都配置完域名對映,可免密訪問,保證兩叢集可正常執行。
2. Kylin下載解壓後放在叢集B機器test-data-slave-2 的/home/hadoop/kylin/apache-kylin-2.4.0-bin-cdh57目錄下= $KYLIN_HOME
3. 所有配置檔案複製到 $KYLIN_HOME (CDH的配置檔案都預設放在/etc/hadoop/conf; /etc/hive/conf; ….)
- 把叢集A的/etc/hadoop/conf 下的 core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml 複製到 $ YLIN_HOME 的conf目錄下
- 把叢集A的/etc/hive/conf 下的 hive-site.xml 複製到 $KYLIN_HOME 的conf目錄下
- 把叢集B的/etc/hbase/conf 下的hbase-site.xml 複製到 $KYLIN_HOME 的conf目錄下
原則上,這些從叢集拷貝的配置檔案都不需要改,但是如果hdfs或者hive的指向地址為本地地址,就需要改成遠端訪問地址!
[hadoop@test-data-slave-2 conf]$ ll total 76 -rw-r--r-- 1 hadoop data3865 Dec 11 15:37 core-site.xml -rw-r--r-- 1 hadoop data2926 Dec 11 15:42 hbase-site.xml -rw-r--r-- 1 hadoop data1748 Dec 11 15:37 hdfs-site.xml -rw-r--r-- 1 hadoop data5517 Dec 11 15:41 hive-site.xml -rw-r--r-- 1 hadoop data3605 Jun 20 15:53 kylin_hive_conf.xml -rw-r--r-- 1 hadoop data3807 Jun 20 15:53 kylin_job_conf_inmem.xml -rw-r--r-- 1 hadoop data3386 Dec 12 11:08 kylin_job_conf.xml -rw-r--r-- 1 hadoop data1156 Jun 20 15:53 kylin-kafka-consumer.xml -rw-r--r-- 1 hadoop data 13112 Dec 11 20:35 kylin.properties -rw-r--r-- 1 hadoop data1339 Jun 20 15:53 kylin-server-log4j.properties -rw-r--r-- 1 hadoop data1656 Jun 20 15:53 kylin-tools-log4j.properties -rw-r--r-- 1 hadoop data4563 Dec 11 15:40 mapred-site.xml -rwxr-xr-x 1 hadoop data3649 Jun 20 15:53 setenv.sh -rw-r--r-- 1 hadoop data3828 Dec 11 15:39 yarn-site.xml
以下是各個主要檔案的配置資訊:
core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://see-data-pre-master-01:8020</value> </property> <property> <name>fs.trash.interval</name> <value>1</value> </property> <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value> </property> <property> <name>hadoop.security.authentication</name> <value>simple</value> </property> <property> <name>hadoop.security.authorization</name> <value>false</value> </property> <property> <name>hadoop.rpc.protection</name> <value>authentication</value> </property> <property> <name>hadoop.security.auth_to_local</name> <value>DEFAULT</value> </property> <property> <name>hadoop.proxyuser.oozie.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.oozie.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.mapred.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.mapred.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.flume.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.flume.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.HTTP.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.HTTP.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hive.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hive.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.httpfs.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.httpfs.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hdfs.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hdfs.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value> </property> <property> <name>hadoop.security.group.mapping</name> <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value> </property> <property> <name>hadoop.security.instrumentation.requires.admin</name> <value>false</value> </property> <property> <name>net.topology.script.file.name</name> <value>/etc/hadoop/conf.cloudera.yarn/topology.py</value> </property> <property> <name>io.file.buffer.size</name> <value>65536</value> </property> <property> <name>hadoop.ssl.enabled</name> <value>false</value> </property> <property> <name>hadoop.ssl.require.client.cert</name> <value>false</value> <final>true</final> </property> <property> <name>hadoop.ssl.keystores.factory.class</name> <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value> <final>true</final> </property> <property> <name>hadoop.ssl.server.conf</name> <value>ssl-server.xml</value> <final>true</final> </property> <property> <name>hadoop.ssl.client.conf</name> <value>ssl-client.xml</value> <final>true</final> </property> </configuration>
hbase-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://test-data-master-1:8020/hbase_test</value> </property> <property> <name>hbase.client.write.buffer</name> <value>2097152</value> </property> <property> <name>hbase.client.pause</name> <value>100</value> </property> <property> <name>hbase.client.retries.number</name> <value>35</value> </property> <property> <name>hbase.client.scanner.caching</name> <value>100</value> </property> <property> <name>hbase.client.keyvalue.maxsize</name> <value>10485760</value> </property> <property> <name>hbase.ipc.client.allowsInterrupt</name> <value>true</value> </property> <property> <name>hbase.client.primaryCallTimeout.get</name> <value>10</value> </property> <property> <name>hbase.client.primaryCallTimeout.multiget</name> <value>10</value> </property> <property> <name>hbase.fs.tmp.dir</name> <value>/user/${user.name}/hbase-staging</value> </property> <property> <name>hbase.client.scanner.timeout.period</name> <value>60000</value> </property> <property> <name>hbase.coprocessor.region.classes</name> <value>org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint</value> </property> <property> <name>hbase.regionserver.thrift.http</name> <value>false</value> </property> <property> <name>hbase.thrift.support.proxyuser</name> <value>false</value> </property> <property> <name>hbase.rpc.timeout</name> <value>60000</value> </property> <property> <name>hbase.snapshot.enabled</name> <value>true</value> </property> <property> <name>hbase.snapshot.master.timeoutMillis</name> <value>60000</value> </property> <property> <name>hbase.snapshot.region.timeout</name> <value>60000</value> </property> <property> <name>hbase.snapshot.master.timeout.millis</name> <value>60000</value> </property> <property> <name>hbase.security.authentication</name> <value>simple</value> </property> <property> <name>hbase.rpc.protection</name> <value>authentication</value> </property> <property> <name>zookeeper.session.timeout</name> <value>60000</value> </property> <property> <name>zookeeper.znode.parent</name> <value>/hbase_test</value> </property> <property> <name>zookeeper.znode.rootserver</name> <value>root-region-server-test</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>test-data-master-1,test-data-slave-2,test-data-slave-1</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.rest.ssl.enabled</name> <value>false</value> </property> </configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:///dfs/nn</value> </property> <property> <name>dfs.namenode.servicerpc-address</name> <value>see-data-pre-master-01:8022</value> </property> <property> <name>dfs.https.address</name> <value>see-data-pre-master-01:50470</value> </property> <property> <name>dfs.https.port</name> <value>50470</value> </property> <property> <name>dfs.namenode.http-address</name> <value>see-data-pre-master-01:50070</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <property> <name>dfs.client.use.datanode.hostname</name> <value>false</value> </property> <property> <name>fs.permissions.umask-mode</name> <value>022</value> </property> <property> <name>dfs.namenode.acls.enabled</name> <value>false</value> </property> <property> <name>dfs.client.use.legacy.blockreader</name> <value>false</value> </property> <property> <name>dfs.client.read.shortcircuit</name> <value>false</value> </property> <property> <name>dfs.domain.socket.path</name> <value>/var/run/hdfs-sockets/dn</value> </property> <property> <name>dfs.client.read.shortcircuit.skip.checksum</name> <value>false</value> </property> <property> <name>dfs.client.domain.socket.data.traffic</name> <value>false</value> </property> <property> <name>dfs.datanode.hdfs-blocks-metadata.enabled</name> <value>true</value> </property> </configuration>
hive-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>hive.metastore.uris</name> <value>thrift://see-data-pre-master-01:9083</value> </property> <property> <name>hive.metastore.client.socket.timeout</name> <value>300</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>hive.warehouse.subdir.inherit.perms</name> <value>true</value> </property> <property> <name>hive.enable.spark.execution.engine</name> <value>false</value> </property> <property> <name>hive.conf.restricted.list</name> <value>hive.enable.spark.execution.engine</value> </property> <property> <name>hive.auto.convert.join</name> <value>true</value> </property> <property> <name>hive.auto.convert.join.noconditionaltask.size</name> <value>20971520</value> </property> <property> <name>hive.optimize.bucketmapjoin.sortedmerge</name> <value>false</value> </property> <property> <name>hive.smbjoin.cache.rows</name> <value>10000</value> </property> <property> <name>mapred.reduce.tasks</name> <value>-1</value> </property> <property> <name>hive.exec.reducers.bytes.per.reducer</name> <value>67108864</value> </property> <property> <name>hive.exec.copyfile.maxsize</name> <value>33554432</value> </property> <property> <name>hive.exec.reducers.max</name> <value>1099</value> </property> <property> <name>hive.vectorized.groupby.checkinterval</name> <value>4096</value> </property> <property> <name>hive.vectorized.groupby.flush.percent</name> <value>0.1</value> </property> <property> <name>hive.compute.query.using.stats</name> <value>false</value> </property> <property> <name>hive.vectorized.execution.enabled</name> <value>true</value> </property> <property> <name>hive.vectorized.execution.reduce.enabled</name> <value>false</value> </property> <property> <name>hive.merge.mapfiles</name> <value>true</value> </property> <property> <name>hive.merge.mapredfiles</name> <value>false</value> </property> <property> <name>hive.cbo.enable</name> <value>false</value> </property> <property> <name>hive.fetch.task.conversion</name> <value>minimal</value> </property> <property> <name>hive.fetch.task.conversion.threshold</name> <value>268435456</value> </property> <property> <name>hive.limit.pushdown.memory.usage</name> <value>0.1</value> </property> <property> <name>hive.merge.sparkfiles</name> <value>true</value> </property> <property> <name>hive.merge.smallfiles.avgsize</name> <value>16777216</value> </property> <property> <name>hive.merge.size.per.task</name> <value>268435456</value> </property> <property> <name>hive.optimize.reducededuplication</name> <value>true</value> </property> <property> <name>hive.optimize.reducededuplication.min.reducer</name> <value>4</value> </property> <property> <name>hive.map.aggr</name> <value>true</value> </property> <property> <name>hive.map.aggr.hash.percentmemory</name> <value>0.5</value> </property> <property> <name>hive.optimize.sort.dynamic.partition</name> <value>false</value> </property> <property> <name>spark.executor.memory</name> <value>268435456</value> </property> <property> <name>spark.driver.memory</name> <value>268435456</value> </property> <property> <name>spark.executor.cores</name> <value>1</value> </property> <property> <name>spark.yarn.driver.memoryOverhead</name> <value>26</value> </property> <property> <name>spark.yarn.executor.memoryOverhead</name> <value>26</value> </property> <property> <name>spark.dynamicAllocation.enabled</name> <value>true</value> </property> <property> <name>spark.dynamicAllocation.initialExecutors</name> <value>1</value> </property> <property> <name>spark.dynamicAllocation.minExecutors</name> <value>1</value> </property> <property> <name>spark.dynamicAllocation.maxExecutors</name> <value>2147483647</value> </property> <property> <name>hive.metastore.execute.setugi</name> <value>true</value> </property> <property> <name>hive.support.concurrency</name> <value>true</value> </property> <property> <name>hive.zookeeper.quorum</name> <value>see-data-pre-master-01</value> </property> <property> <name>hive.zookeeper.client.port</name> <value>2181</value> </property> <property> <name>hive.zookeeper.namespace</name> <value>hive_zookeeper_namespace_hive</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>see-data-pre-master-01</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hive.cluster.delegation.token.store.class</name> <value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value> </property> <property> <name>hive.server2.enable.doAs</name> <value>true</value> </property> <property> <name>hive.server2.use.SSL</name> <value>false</value> </property> <property> <name>spark.shuffle.service.enabled</name> <value>true</value> </property> </configuration>
mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>mapreduce.job.split.metainfo.maxsize</name> <value>10000000</value> </property> <property> <name>mapreduce.job.counters.max</name> <value>120</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>false</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.DefaultCodec</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>zlib.compress.level</name> <value>DEFAULT_COMPRESSION</value> </property> <property> <name>mapreduce.task.io.sort.factor</name> <value>64</value> </property> <property> <name>mapreduce.map.sort.spill.percent</name> <value>0.8</value> </property> <property> <name>mapreduce.reduce.shuffle.parallelcopies</name> <value>10</value> </property> <property> <name>mapreduce.task.timeout</name> <value>600000</value> </property> <property> <name>mapreduce.client.submit.file.replication</name> <value>1</value> </property> <property> <name>mapreduce.job.reduces</name> <value>5</value> </property> <property> <name>mapreduce.task.io.sort.mb</name> <value>256</value> </property> <property> <name>mapreduce.map.speculative</name> <value>false</value> </property> <property> <name>mapreduce.reduce.speculative</name> <value>false</value> </property> <property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.8</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>see-data-pre-master-01:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>see-data-pre-master-01:19888</value> </property> <property> <name>mapreduce.jobhistory.webapp.https.address</name> <value>see-data-pre-master-01:19890</value> </property> <property> <name>mapreduce.jobhistory.admin.address</name> <value>see-data-pre-master-01:10033</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user</value> </property> <property> <name>mapreduce.am.max-attempts</name> <value>2</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>1024</value> </property> <property> <name>yarn.app.mapreduce.am.resource.cpu-vcores</name> <value>1</value> </property> <property> <name>mapreduce.job.ubertask.enable</name> <value>false</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Djava.net.preferIPv4Stack=true</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Djava.net.preferIPv4Stack=true</value> </property> <property> <name>yarn.app.mapreduce.am.admin.user.env</name> <value>LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:$JAVA_LIBRARY_PATH</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>0</value> </property> <property> <name>mapreduce.map.cpu.vcores</name> <value>1</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>0</value> </property> <property> <name>mapreduce.reduce.cpu.vcores</name> <value>1</value> </property> <property> <name>mapreduce.job.heap.memory-mb.ratio</name> <value>0.8</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH</value> </property> <property> <name>mapreduce.admin.user.env</name> <value>LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:$JAVA_LIBRARY_PATH</value> </property> <property> <name>mapreduce.shuffle.max.connections</name> <value>80</value> </property> </configuration>
yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>yarn.acl.enable</name> <value>true</value> </property> <property> <name>yarn.admin.acl</name> <value>*</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>see-data-pre-master-01:8032</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>see-data-pre-master-01:8033</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>see-data-pre-master-01:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>see-data-pre-master-01:8031</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>see-data-pre-master-01:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address</name> <value>see-data-pre-master-01:8090</value> </property> <property> <name>yarn.resourcemanager.client.thread-count</name> <value>50</value> </property> <property> <name>yarn.resourcemanager.scheduler.client.thread-count</name> <value>50</value> </property> <property> <name>yarn.resourcemanager.admin.client.thread-count</name> <value>1</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <property> <name>yarn.scheduler.increment-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>3374</value> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.scheduler.increment-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>8</value> </property> <property> <name>yarn.resourcemanager.amliveliness-monitor.interval-ms</name> <value>1000</value> </property> <property> <name>yarn.am.liveness-monitor.expiry-interval-ms</name> <value>600000</value> </property> <property> <name>yarn.resourcemanager.am.max-attempts</name> <value>2</value> </property> <property> <name>yarn.resourcemanager.container.liveness-monitor.interval-ms</name> <value>600000</value> </property> <property> <name>yarn.resourcemanager.nm.liveness-monitor.interval-ms</name> <value>1000</value> </property> <property> <name>yarn.nm.liveness-monitor.expiry-interval-ms</name> <value>600000</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.client.thread-count</name> <value>50</value> </property> <property> <name>yarn.application.classpath</name> <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <name>yarn.scheduler.fair.user-as-default-queue</name> <value>true</value> </property> <property> <name>yarn.scheduler.fair.preemption</name> <value>false</value> </property> <property> <name>yarn.scheduler.fair.sizebasedweight</name> <value>false</value> </property> <property> <name>yarn.scheduler.fair.assignmultiple</name> <value>false</value> </property> <property> <name>yarn.resourcemanager.max-completed-applications</name> <value>10000</value> </property> </configuration>
4. 在 $KYLIN_HOME /conf/下的 kylin.properties 中追加以下設定
kylin.source.hive.beeline-shell=beeline kylin.source.hive.beeline-params=-n hadoop --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://see-data-pre-master-01:10000 # 重要:這是通過beeline向叢集A的Hive指定Kylin計算過程中產生的中間表儲存的資料庫 kylin.source.hive.database-for-flat-table=kylinkylin.source.hive.redistribute-flat-table=true kylin.storage.url=hbase kylin.storage.hbase.cluster-fs=hdfs://test-data-master-1:8020# 重要:這是叢集B的zookeeper節點,Hbase要依賴zk,需要加上 kylin.env.zookeeper-connect-string=test-data-master-1,test-data-slave-2,test-data-slave-1
5. 配置環境變數
在安裝Kylin的機器上配置 ~/.bashrc 檔案,追加以下內容
# hadoop export CONF_HOME=/home/hadoop/kylin/apache-kylin-2.4.0-bin-cdh57/conf export HBASE_CONF=$CONF_HOME export HBASE_CONF_DIR=$CONF_HOME export HADOOP_CONF_DIR=$CONF_HOME export HIVE_CONF=$CONF_HOME export HIVE_CONF_DIR=$CONF_HOME #added by Hive hcatalog export HCAT_HOME=/opt/cloudera/parcels/CDH/lib/hive-hcatalog #add by KYLIN export KYLIN_HOME=/home/hadoop/kylin/apache-kylin-2.4.0-bin-cdh57 export PATH=$KYLIN_HOME/bin:$PATH
這個環境變數告訴Kylin不取本機的hadoop計算,重要!
編輯完後 source ~/.bashrc 以下讓其生效!
6. 單服務驗證以上配置是否正確
在叢集B,安裝Kylin的機器下執行以下操作以確定是否都指向了叢集A
- 驗證HDFS,以下的結果是叢集A上的HDFS目錄
[hadoop@test-data-slave-2 conf]$ hdfs dfs -ls /user/hive/warehouse/ Found 2 items drwxrwxrwt- hadoophive0 2018-12-11 19:23 /user/hive/warehouse/kylin.db drwxrwxrwt- superuser hive0 2018-12-12 10:53 /user/hive/warehouse/test_default
- 驗證HIVE,開啟HIVE CLI,是叢集A的hive資料庫
hive> show databases; OK default kylin Time taken: 1.78 seconds, Fetched: 2 row(s)
- 驗證YARN,Running列表裡頭的兩臺機器是叢集A的
2018-12-12 18:14:55,885 INFO[main] client.RMProxy (RMProxy.java:createRMProxy(123)) - Connecting to ResourceManager at see-data-pre-master-01/10.5.8.10:8032 Total Nodes:2 Node-IdNode-StateNode-Http-AddressNumber-of-Running-Containers see-data-pre-slave-1:8041RUNNINGsee-data-pre-slave-1:80420 see-data-pre-master-01:8041RUNNINGsee-data-pre-master-01:80420
7. 到 $KYLIN_HOME 下執行Kylin
./kylin.sh start
8. 開啟Kylin UI,執行demo建cube ,到叢集A的CDH Yarn頁面檢視建CUBE的MR任務
在Kylin UI 中檢視CUBE儲存後的Kylin表
到叢集B的Hbase中驗證
至此搭建Kylin讀寫分離完成。