1. 程式人生 > >Hadoop中-put和-copyFromLocal的區別

Hadoop中-put和-copyFromLocal的區別

如下中的stackoverflow的連結。

簡單的說,-put更寬鬆,可以把本地或者HDFS上的檔案拷貝到HDFS中;而-copyFromLocal則更嚴格限制只能拷貝本地檔案到HDFS中。

???

PS:“ put would prefer the HDFS scheme instead of the local file system”,也就是說,如果本地和HDFS上都存在相同路徑,則-put跟趨於優先取HDFS的源。

但是我測試了:

hadoop fs -put hdfs:///tmp/hive-XXX/test.txt /user/XXX/test.txt.hdfs

hadoop fs -put /tmp/hive-XXX/test.txt /user/XXX/test.txt.local       

hadoop fs -cat /user/XXX/test.txt.*    

local path:/tmp/hive-XXX
local path:/tmp/hive-XXX

所以。。。。

連結:http://stackoverflow.com/questions/7811284/difference-between-hadoop-fs-put-and-hadoop-fs-copyfromlocal

——————————————————————————————————————————————

-put and -copyFromLocal are documented as identical, while most examples use the verbose variant -copyFromLocal. Why?

Same thing for -get and -copyToLocal

  • copyFromLocal is similar to put command, except that the source is restricted to a local file reference.

So, basically you can do with put, all that you do with copyFromLocal, but not vice-versa.

Similarly,

  • copyToLocal is similar to get command, except that the destination is restricted to a local file
    reference.

Hence, you can use get instead of copyToLocal, but not the other way round.

Let's make an example: If your HDFS contains the path: /tmp/dir/abc.txt And if your local disk also contains this path then the hdfs API won't know which one you mean, unless you specify a scheme like file:// or hdfs://. Maybe it picks the path you did not want to copy.

Therefore you have -copyFromLocal which is preventing you from accidentally copying the wrong file, by limiting the parameter you give to the local filesystem.

Put is for more advanced users who know which scheme to put in front.

It is always a bit confusing to new Hadoop users which filesystem they are currently in and where their files actually are.

1
What do you mean by "the hdfs API won't know which one you mean"? For '-put' the source is always the first argument. Or you mean that some users may confuse '-put' with '-get' ? –  snappy Oct 18 '11 at 17:52
No, neither way. We are speaking about two different file systems here. HDFS and local file system (say ext4). By using bin/hadoop fs -put /tmp/somepath /user/hadoop/somepath the command actually does not know whether /tmp/somepath exists in both filesystems, or just in local filesystem. Same thing with the destination path. –  Thomas Jungblut Oct 18 '11 at 17:58
5
So the first parameter is not always an local fs path so to say. You can put from one HDFS to another if you'd like. -copyFromLocal will ensure that it just picks from the local disk and uploads to HDFS. – Thomas Jungblut Oct 18 '11 at 17:58 
Why does it need to know? Your command example (and the -copyFromLocal variant) always copies /tmp/somepath/* from local to /user/hadoop/somepath/* on HDFS, and creates /user/hadoop/somepath directories if they are not yet created. Right? –  snappy Oct 18 '11 at 18:08
No, put would prefer the HDFS scheme instead of the local file system. copyFromLocal would not do this and pick it from local file system. –  Thomas Jungblut Oct 19 '11 at 8:06