1. 程式人生 > >kylin填坑記--建立cube時遇到的兩個坑

kylin填坑記--建立cube時遇到的兩個坑

建立cube時,最容易出錯的地方就是在Build Dimension Dictionary這步,也就是第四步。如下圖


這步,kylin後臺會做很多關於欄位的檢查。遇到的兩個坑,正是發生在這步,因為資料本身有這樣的問題:

第一,維度表中型別為longtext的欄位description(存的描述資訊,很長),其長度超出Short.MAX_VALUE(short值得範圍:-32768-32767)。儘管這個欄位在model和cube建立時都未加入dimensions中,但還是報錯了,也就是說,kylin會檢查所有維度表字段裡value的長度,不管有沒有加到dimensions裡。報錯原始碼如下:

if 
(maxValueLength < 0) { throw new IllegalStateException("maxValueLength is negative (" + maxValueLength + "). Dict value is too long, whose length is larger than " + Short.MAX_VALUE); }

第二,建model時,一般會把事實表的外來鍵和維度表的主鍵做關聯,但在hive中並不存在主外來鍵這種概念,所以維度表不管是不是主鍵,kylin都會檢查其唯一性。否則,一張事實表的一條記錄,會關聯出兩條或多條維度表的記錄,這種情況肯定是非法的。因此,與事實表做關聯的的維度表字段必須是唯一的,且非空,即為主鍵(但hive中不存主鍵這一說)。

針對上面兩個坑,我們把錯誤日誌記錄下來了

坑一:build cube時維度表字段的value值太長,報瞭如下錯,經過多天處理,各種嘗試,才找到了原因,description欄位為longtxt型別的,太長。因此,我們將hive表中的這個欄位刪掉,重新匯入資料,問題解決。

java.lang.IllegalStateException: stats.maxValueLength is not positive short, usually caused by too long dict value.
at org.apache.kylin.dict.TrieDictionaryBuilder.positiveShortPreCheck(TrieDictionaryBuilder.java:490)
at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(TrieDictionaryBuilder.java:447)
at org.apache.kylin.dict.TrieDictionaryBuilder.build(TrieDictionaryBuilder.java:418)
at org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:98)
at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:139)
at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:287)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:87)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:49)
at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:66)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:62)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:144)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
result code:2

報錯日誌中並沒有給出那個欄位太長,讓我們嘗試了很多解決辦法,都沒解決此問題,因為我們一直認為沒有加到model中的維度,應該不會檢查其維度的value值長度。因此,希望kylin開發團隊在以後版本中,把報錯日誌體現的更加詳細點,最好能具體到哪個欄位太長表示出來。另外,沒有加到dimensions裡的維度,為啥還要做長度檢查。。。沒必要吧!!!希望也能把這點改善一下!

坑二:build cube時維度表關聯實施表的欄位,不唯一,kylin後臺做唯一性檢查時,報瞭如下錯:

java.lang.IllegalStateException: The table: SCORE4 Dup key found, key=[0001], value1=[0001,99.0,100], value2=[0001,98.0,99]
at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:86)
at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:69)
at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:79)
at org.apache.kylin.dict.lookup.LookupTable.<init>(LookupTable.java:57)
at org.apache.kylin.dict.lookup.LookupStringTable.<init>(LookupStringTable.java:65)
at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:648)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:93)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:49)
at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:66)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:62)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:144)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

result code:2

解決辦法,重新建立關聯關係,將維度表的關聯欄位設定為唯一的。