April 2024
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  

Categories

April 2024
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  

HDFS Command Syntax

HDFS Command Syntax Overview:
hadoop fs
: Ex.: hadoop fs -ls 
hadoop version : check hadoop installed properly

HELP:
help [cmd]: hopefully this is self-describing 

Inspect files:
ls/lsr : list all files in
cat : print on stdout
tail [-f] : output the last part of the

test : return attributes of file and directory
touchz : create new emty file size 0
du/dus : show space utilization

count : no. of directories, files, and bytes
setrep : (-r) change the replication factor of file/directory
stat : info about the specified path
Create/remove files:
mkdir : create a directory
mv : move (rename) files
cp : copy files
rm/rmr : remove files
Copy/Put files from remote m/c into the HADOOP cluster:
copyFromLocal : copy a local file to the HDFS
copyToLocal : copy a file on the HDFS to the local disk

cp : copies one or more files
get : copies files to the local file system
put : copies files from the local file system
mv : moves one or more files

Hadoop Namenode Commands:
hadoop namenode -format: Format HDFS filesystem from Namenode
hadoop namenode -upgrade: Upgrade the NameNode
start-dfs.sh Start: HDFS Daemons
stop-dfs.sh Stop: HDFS Daemons
start-mapred.sh: Start: MapReduce Daemons
stop-mapred.sh Stop: MapReduce Daemons
hadoop namenode -recover -force: Recover namenode metadata after a cluster failure (may lose data) 

Hadoop Configuration Files:
core-site.xml : Parameters for entire Hadoop cluster
hdfs-site.xml : Parameters for HDFS and its clients
mapred-site.xml : Parameters for MapReduce and its clients

yarn-site.xml : Parameters for nodemanager and resource manager
masters : Host machines for secondary Namenode
slaves : List of slave hosts

hadoop-env.sh : Sets ENV variables for Hadoop 
set JAVA_HOME=%JAVA_HOME%
set HADOOP_PREFIX=D:\Hadoop

Hadoop Job Commands
hadoop job -submit : Submit the job
hadoop job -status : Print job status completion percentage
hadoop job -list all : List all jobs
hadoop job -list-active-trackers : List all available TaskTrackers
hadoop job -set-priority : Set priority for a job. Valid priorities : VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW
hadoop job -kill-task : Kill a task
hadoop job -history : Display job history including job details, failed and killed jobs
Hadoop mradmin Commands
hadoop mradmin -safemode get : Check Job tracker status
hadoop mradmin -refreshQueues : Reload mapreduce configuration
hadoop mradmin -refreshNodes : Reload active TaskTrackers
hadoop mradmin -refreshServiceAcl : Force Jobtracker to reload service ACL
hadoop mradmin -refreshUserToGroupsMappings : Force jobtracker to reload user group mappings
Hadoop fsck Commands
hadoop fsck / : Filesystem check on HDFS
hadoop fsck / -files : Display files during check
hadoop fsck / -files -blocks : Display files and blocks during check
hadoop fsck / -files -blocks -locations : Display files, blocks and its locationhadoop fsck / -files -blocks -locations -racks : Display network topology for data-node locations
hadoop fsck -delete : Delete corrupted files
hadoop fsck -move : Move corrupted files to /lost+found directory

Hadoop Balancer Commands
start-balancer.sh : Balance the cluster
hadoop dfsadmin -setBalancerBandwidth : Adjust bandwidth used by the balancer
hadoop balancer -threshold 20 : Limit balancing to only 20% resources in the cluster

Hadoop Safe Mode (Maintenance Mode) Commands
The following dfsadmin commands helps the cluster to enter or leave safe mode, which is also called as maintenance mode.
In this mode, Namenode does not accept any changes to the name space, it does not replicate or delete blocks.
hadoop dfsadmin -safemode enter : Enter safe mode
hadoop dfsadmin -safemode leave : Leave safe mode
hadoop dfsadmin -safemode get : Get the status of mode
hadoop dfsadmin -safemode wait : Wait until HDFS finishes data block replication
hadoop dfsadmin -report : total usage on the cluster

Launching Hadoop Jobs:
hadoop jar [mainClass] args... :
Launch job via jar file
hadoop jar com.twitter.scalding.Tool [mainClass] args : A Scalding job is launched using 
mapred job -kill : If you need to kill a map-reduce job  

Commonly Used Administration Commands:
Format the namenode: hadoop namenode -format
Starting Secondary namenode: hadoop secondrynamenode
Run namenode : hadoop namenode
Run data node: hadoop datanode
Cluster Balancing: hadoop balancer
Run MapReduce job tracker node: hadoop jobtracker
Run MapReduce task tracker node: hadoop tasktracker

Start/Stop Yarn (starts resourcemanager and nodemanager)and DFS (Starts namenode and data node) from sbin directory:
start-yarn, stop-yarn
start-dfs, stop-dfs


Start and Stop ALL daemon from sbin directory:
start-all, stop-all


Check All 5 daemons (Namenode,Secoundary Node,Job Tracker, DataNode, Task Tracker ) are up:

jps

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>