spark rdd根據key儲存進不同的資料夾

阿新 • • 發佈：2019-01-15

public class TextOutputFormat<K, V> extends FileOutputFormat<K, V> { //靜態內部類，LineRecordWriter，實現了RecordWriter。這個就是我們想要的 protected static class LineRecordWriter<K, V> implements RecordWriter<K, V> { private static final String utf8 = "UTF-8"; private static final byte[] newline; //newline 是用\n標記換行 static { try { newline = "\n".getBytes(utf8); } catch (UnsupportedEncodingException uee) { throw new IllegalArgumentException("can't find " + utf8 + " encoding"); } } protected DataOutputStream out; private final byte[] keyValueSeparator; public LineRecordWriter(DataOutputStream out, String keyValueSeparator) { this.out = out; try { this.keyValueSeparator = keyValueSeparator.getBytes(utf8); } catch (UnsupportedEncodingException uee) { throw new IllegalArgumentException("can't find " + utf8 + " encoding"); } } public LineRecordWriter(DataOutputStream out) { this(out, "\t"); } /** * Write the object to the byte stream, handling Text as a special * case. * @param o the object to print * @throws IOException if the write throws, we pass it on */ private void writeObject(Object o) throws IOException { if (o instanceof Text) { Text to = (Text) o; out.write(to.getBytes(), 0, to.getLength()); } else { out.write(o.toString().getBytes(utf8)); } } //核心方法，通過這個方法把key和value寫入到hdfs檔案中 public synchronized void write(K key, V value) throws IOException { boolean nullKey = key == null || key instanceof NullWritable; boolean nullValue = value == null || value instanceof NullWritable; if (nullKey && nullValue) {//如果key和value都為null或者NullWritable就退出 return; } if (!nullKey) { writeObject(key);//寫入key } if (!(nullKey || nullValue)) { out.write(keyValueSeparator);//寫入分隔符，預設是\t } if (!nullValue) { writeObject(value); //寫入value } out.write(newline); //寫入換行符 } public synchronized void close(Reporter reporter) throws IOException { out.close(); } } public RecordWriter<K, V> getRecordWriter(FileSystem ignored, JobConf job, String name, Progressable progress) throws IOException { boolean isCompressed = getCompressOutput(job); String keyValueSeparator = job.get("mapreduce.output.textoutputformat.separator", "\t"); if (!isCompressed) { Path file = FileOutputFormat.getTaskOutputPath(job, name); FileSystem fs = file.getFileSystem(job); FSDataOutputStream fileOut = fs.create(file, progress); return new LineRecordWriter<K, V>(fileOut, keyValueSeparator); } else { Class<? extends CompressionCodec> codecClass = getOutputCompressorClass(job, GzipCodec.class); // create the named codec CompressionCodec codec = ReflectionUtils.newInstance(codecClass, job); // build the filename including the extension Path file = FileOutputFormat.getTaskOutputPath(job, name + codec.getDefaultExtension()); FileSystem fs = file.getFileSystem(job); FSDataOutputStream fileOut = fs.create(file, progress); return new LineRecordWriter<K, V>(new DataOutputStream (codec.createOutputStream(fileOut)), keyValueSeparator); } } }

spark rdd根據key儲存進不同的資料夾

spark rdd根據key儲存進不同的資料夾

Spark RDD 按Key儲存到不同檔案

Spark RDD/DataFrame map儲存資料的兩種方式

Spark實現根據key值來分目錄儲存檔案多檔案輸出(MultipleOutputFormat)

Riak, Spark, Golang, Erlang, 雲端儲存, 雲端計算, 資料探勘

spark rdd根據某一列去重

shell指令碼--儲存清理空資料夾

Python:批量按xml標註將目標crop剪下圖片並按類儲存到相應資料夾

同名model但是不同資料夾或者專案下require_cache問題

logback將日誌寫入不同資料夾裡

不同域名指向同一伺服器下的不同資料夾下

不同資料夾圖示設計

Android實現截圖，將截圖檔案儲存到本地資料夾

使用OpenCV2批量裁剪圖片，並將裁剪後的圖片儲存至指定資料夾

使用python將圖片按標籤分入不同資料夾

python引入不同資料夾下的自定義模組

php下載檔案並儲存到指定資料夾

原創：samba實現不同許可權進入不同資料夾

新手--網路訪問的json資料儲存在本地資料夾, 離線使用

python py檔案如何呼叫不同資料夾下的py檔案

spark rdd根據key儲存進不同的資料夾

相關推薦