Hadoop FS | HDFS DFS Commands with Examples - Spark By {Examples} (2023)

Table of Contents
What is HDFS? Start Hadoop Services Basic HDFS DFS Commands ls – List Files and Folder mkdir – Make Directory rm – Remove File or Directory rmr – Remove Directory Recursively rmdir – Delete a Directory put – Upload a File to HDFS from Local cat – Displays the Content of the File du – File Occupied in Disk dus – Directory/file of the total size get – Copy the File from HDFS to Local getmerge – Merge Multiple Files in an HDFS count – Number of Directory mv – Moves Files from Source to Destination moveFromLocal – Move file / Folder from Local disk to HDFS moveToLocal – Move a File to HDFS from Local Cp – Copy Files from Source to Destination setrep – Changes the Replication Factor of a File tail – Displays Last Kilobyte of the File touch – Create and Modify Timestamps of a File touchz – Create a File of zero Length appendToFile – Appends the Content to the File copyFromLocal – Copy File from Local file System copyToLocal – Copy Files from HDFS to Local file System usage – Return the Help for Individual Command checksum -Returns the Checksum Information of a File chgrp – Change Group Association of Files chmod – Change the Permissions of a File chown – Change the Owner and Group of a File df – Displays free Space head – Displays first Kilobyte of the File createSnapshots – Create Snapshottable Directory deleteSnapshots – Delect Snapshottable Directory renameSnapshots – Rename a Snapshot expunge – Create New Checkpoint Stat – File/Directory Print Statistics Truncate – Specified File Pattern and Length Find – Find File Size in HDFS Reference Related Articles You may also like reading: Videos
Hadoop FS | HDFS DFS Commands with Examples - Spark By {Examples} (1)

Apache Hadoop hadoop fs or hdfs dfs are file system commands to interact with HDFS, these commands are very similar to Unix Commands. Note that some Syntax and output formats may differ between Unix and HDFS Commands.

Hadoop is a open-source distributed framework that is used to store and process a large set of datasets. To store data, Hadoop uses HDFS, and to process data, it uses MapReduce & Yarn. In this article, I will mainly focus on Hadoop HDFS commands to interact with the files.

Hadoop provides two types of commands to interact with File System; hadoop fs or hdfs dfs. Major difference being hadoop commands are supported with multiple file systems like S3, Azure and many more.

What is HDFS?

HDFS is a distributed file system that stores data on commodity machines and provides very high aggregate bandwidth across the cluster.

Once written you cannot change the contents of the files on HDFS. It’s a write once read many numbers of times.

Start Hadoop Services

In order to run hdfs dfs or hadoop fs commands, first, you need to start the Hadoop services by running thestart-dfs.shscript fromthe Hadoop installation. If you don’t have a Hadoop setup, follow Apache Hadoop Installation on Linux guide.

[emailprotected]:~$ start-dfs.shStarting namenodes on [namenode.socal.rr.com]Starting datanodesStarting secondary namenodes [namenode][emailprotected]:~$

Note that start-dfs.sh commands starts, name node, secondary name node, and data nodes.

Basic HDFS DFS Commands

Below are basic hdfs dfs or hadoop fs Commands.

Command Description
-lsList files with permissions and other details
-mkdirCreates a directory named path in HDFS
-rmTo Remove File or a Directory
-rmrRemoves the file that identified by path / Folder and subfolders
-rmdirDelete a directory
-putUpload a file / Folder from the local disk to HDFS
-catDisplay the contents for a file
-duShows the size of the file on hdfs.
-dusDirectory/file of total size
-getStore file / Folder from HDFS to local file
-getmergeMerge Multiple Files in an HDFS
-countCount number of directory, number of files and file size
-setrepChanges the replication factor of a file
-mvHDFS Command to move files from source to destination
-moveFromLocalMove file / Folder from local disk to HDFS
-moveToLocalMove a File to HDFS from Local
-cpCopy files from source to destination
-tailDisplays last kilobyte of the file
-touchcreate, change and modify timestamps of a file
-touchzCreate a new file on HDFS with size 0 bytes
-appendToFileAppends the content to the file which is present on HDF
-copyFromLocalCopy file from local file system
-copyToLocal Copy files from HDFS to local file system
-usageReturn the Help for Individual Command
-checksumReturns the checksum information of a file
-chgrpChange group association of files/change the group of a file or a path
-chmodChange the permissions of a file
-chownchange the owner and group of a file
-dfDisplays free space
-headDisplays first kilobyte of the file
-Create Snapshots Create a snapshot of a snapshottable directory
-Delete SnapshotsDelete a snapshot of from a snapshottable directory
-Rename SnapshotsRename a snapshot
-expungecreate new checkpoint
-StatPrint statistics about the file/directory
-truncateTruncate all files that match the specified file pattern to the specified length
-findFind File Size in HDFS

ls – List Files and Folder

HDFS ls command is used to display the list of Files and Directories in HDFS, This ls command shows the files with permissions, user, group, and other details. For more information follow ls- List Files and Folder

(Video) Top Hadoop HDFS Commands with Examples and Usage (Part - 1)

$hadoop fs -lsor$hdfs dfs -ls

mkdir – Make Directory

HDFS mkdir command is used to create a directory in HDFS. By default, this directory would be owned by the user who is creating it. By specifying “/” at the beginning it creates a folder at root directory.

$hadoop fs -mkdir /directory-nameor$hdfs dfs -mkdir /directory-name 

rm – Remove File or Directory

HDFS rm command deletes a file and a directory from HDFSrecursively.

$hadoop fs -rm /file-nameor$hdfs dfs -rm /file-name

rmr – Remove Directory Recursively

Rmr command is used to deletes a file from Directoryrecursively, it is a very useful command when you want to delete anon-empty directory.

$hadoop fs -rmr /directory-nameor$hdfs dfs -rmr /directory-name

rmdir – Delete a Directory

Rmdir command is used to removing directories only if they are empty.

$hadoop fs -rmdir /directory-nameor$hdfs dfs -rmdir /directory-name

put – Upload a File to HDFS from Local

Copy file/folder from local disk to HDFS. On put command specifies the local-file-path where you wanted to copy from and then hdfs-file-path where you wanted to copy to on hdfs.

$ hadoop fs -put /local-file-path /hdfs-file-pathor$ hdfs dfs -put /local-file-path /hdfs-file-path

cat – Displays the Content of the File

Thecat commandreads the specified file from HDFSand displays the content of the file on console or stdout.

$ hadoop fs -cat /hdfs-file-pathor $ hdfs dfs -cat /hdfs-file-path

du – File Occupied in Disk

Du command is used to How much file Occupied in the disk. The field is the base size of the file or directory before replication.

$ hadoop fs -du /hdfs-file-pathor$ hdfs dfs -du /hdfs-file-path

dus – Directory/file of the total size

Dus command is used towill give the total size of directory/file.

$ hadoop fs -dus /hdfs-directory or$ hdfs dfs -dus /hdfs-directory 

get – Copy the File from HDFS to Local

Get command is used to store filess from HDFS to the local file. HDFS file gets the local machine.

(Video) Hadoop HDFS DFS Commands - Learn By Example

$ hadoop fs -get /local-file-path /hdfs-file-pathor$ hdfs dfs -get /local-file-path /hdfs-file-path

getmerge – Merge Multiple Files in an HDFS

If you have multiple files in an HDFS, use -getmerge option command. All these multiple files merged into one single file and downloads to local file system.

$ hadoop fs -getmerge [-nl] /source /local-destinationor$ hdfs dfs -getmerge [-nl] /source /local-destination

count – Number of Directory

The count command is used to count a number of directories, a number of files, and file size on HDFS.

$ hadoop fs -count /hdfs-file-pathor$ hdfs dfs -count /hdfs-file-path

mv – Moves Files from Source to Destination

MV (move) command is used to move files from one location to another location in HDFS. Move command allows multiple sources as well in which case the destination needs to be a director.

$ hadoop fs -mv /local-file-path /hdfs-file-pathor$ hdfs dfs -mv /local-file-path /hdfs-file-path

moveFromLocal – Move file / Folder from Local disk to HDFS

Similar to the put command, moveFromLocalmoves the file or source from the local file path to the destination in the HDFS file path. After this command, you will not find the file on the local file system.

$ hadoop fs -moveFromLocal /local-file-path /hdfs-file-pathor$ hdfs dfs -moveFromLocal /local-file-path /hdfs-file-path

moveToLocal – Move a File to HDFS from Local

Similar to the get command, moveToLocalmoves the file or source from the HDFS file path to the destination in the local file path.

$ hadoop fs -moveToLocal /hdfs-file-path /local-file-path or$ hdfs dfs -moveToLocal /hdfs-file-path /local-file-path 

Cp – Copy Files from Source to Destination

Copy File-one location to another location in HDFS. Copy files from source to destination, Copy command allows multiple sources as well in which case the destination must be a directory.

$ hadoop fs -cp /local-file-path /hdfs-file-pathor$ hdfs dfs -cp /local-file-path /hdfs-file-path

setrep – Changes the Replication Factor of a File

This HDFS command is used to change the replication factor of a file. Ifthe pathis a directory then the command recursively changes the replication factor of all files under the directory tree rooted atthe path.

$ hadoop fs -setrep /number /file-name or$ hdfs dfs -setrep /number /file-name 

tail – Displays Last Kilobyte of the File

Tail command is used to Display last kilobyte of the file to stdout.

$ hadoop fs -tail /hdfs-file-pathor$ hdfs dfs -tail /hdfs-file-path

touch – Create and Modify Timestamps of a File

It is used to create a file without any content. The file created using the touch command is empty. updates the access and modification times of the file specified by the URI to the current time, the file does not exist then a zero-length file is created at URI with the current time as the timestamp of that URI.

(Video) Hadoop Tutorial - HDFS Commands

$ hadoop fs -touch /hdfs-file-pathor$ hdfs dfs -touch /hdfs-file-path

touchz – Create a File of zero Length

Create a new file on HDFS with size 0 bytes. create a file of zero length, an error is returned if the file exists with non-zero length.

$ hadoop fs -touchz /hdfs-file-pathor$ hdfs dfs -touchz /hdfs-file-path

appendToFile – Appends the Content to the File

Appends the content to the file which is present on HDFS. Append single source. or multiple sources from the local file system to the destination file system. this command appends the contents of all the given local files to the provided destination file on the HDFS filesystem.

$ hadoop fs -appendToFile /hdfs-file-pathor$ hdfs dfs -appendToFile /hdfs-file-path

copyFromLocal – Copy File from Local file System

Copying file from a local file to HDFS file system. Similar to thefs -putcommand and copyFromLocal command both are Store files from local disk to HDFS. Except that the source is restricted to a local file reference.

$ hadoop fs -copyToLocal /hdfs-file-path /local-file-pathor$ hdfs dfs -copyToLocal /hdfs-file-path /local-file-path

copyToLocal – Copy Files from HDFS to Local file System

Copying files from HDFS file to local file system. Similar to thefs -getcommand and copyToLocal command both are Store files from hdfs to local files. Except that the destination is restricted to a local file reference.

$ hadoop fs -copyToLocal /hdfs-file-path /local-file-pathor$ hdfs dfs -copyToLocal /hdfs-file-path /local-file-path

usage – Return the Help for Individual Command

Usage command is used to Provide you help for indidual commands.

$ hadoop fs -usage mkdir or$ hdfs dfs -usage mkdir

checksum -Returns the Checksum Information of a File

The checksum command is used to Returns the Checksum Information of a File. Returns the checksum information of a file.

$ hadoop fs -checksum [-v] URIor$ hdfs dfs -checksum [-v] URI

chgrp – Change Group Association of Files

chgrg command is used to change the group of a file or a path. The user must be the owner of files, or else a super-user.

$ hadoop fs -chgrp [-R] groupnameor$ hdfs dfs -chgrp [-R] groupname

chmod – Change the Permissions of a File

This command is used to change the permissions of a file. With -R Used to modify the files recursively and it is the only option that is being supported currently.

$ hadoop fs -chmod [-R] hdfs-file-path or$ hdfs dfs -chmod [-R] hdfs-file-path

chown – Change the Owner and Group of a File

Chown command is used to change the owner and group of a file. This command is similar to the shell’schowncommand with a few exceptions.

(Video) Top Hadoop HDFS Commands with Examples and Usage (Part - 3)

$ hadoop fs -chown [-R] [owner][:[group]] hdfs-file-pathor$ hdfs dfs -chown [-R] [owner][:[group]] hdfs-file-path

df – Displays free Space

Df is the Displays free space. This command is used to show the capacity, free and used space available on theHDFS filesystem. Used to format the sizes of the files in a human-readable manner rather than the number of bytes.

$ hadoop fs -df /user/hadoop/dir1or$ hdfs dfs -df /user/hadoop/dir1

head – Displays first Kilobyte of the File

Head command is use to Displays first kilobyte of the file to stdout.

$ hadoop fs -head /hdfs-file-pathor$ hdfs dfs -head /hdfs-file-path

createSnapshots – Create Snapshottable Directory

This operation requires owner privilege of the snapshot table directory. The path of the snapshot table directory, snapshot name is The snapshot name adefault name is generated using a timestamp.

$ hadoop fs -createSnapshot /path /snapshotNameor$ hdfs dfs -createSnapshot /path /snapshotName

deleteSnapshots – Delect Snapshottable Directory

This operation requires owner privilege of the snapshot table directory. The path of the snapshot table directory, snapshot name is The snapshot name.

$ hadoop fs -deleteSnapshot /path /snapshotNameor$ hdfs dfs -deleteSnapshot /path /snapshotName

renameSnapshots – Rename a Snapshot

This operation requires owner privilege of the snapshottable directory.

$ hadoop fs -renameSnapshot /path /oldName /newNameor$ hdfs dfs -renameSnapshot /path /oldName /newName

expunge – Create New Checkpoint

This command is used to empty the trash available in an HDFS system. Permanently delete files in checkpoints older than the retention threshold from the trash directory.

$ hadoop fs –expunge -immediate -fs /hdfs-file-pathor$ hdfs dfs –expunge -immediate -fs /hdfs-file-path

Stat – File/Directory Print Statistics

This command is used to print the statistics about the file/directory in the specified format. Print statistics about the file/directory at in the specified format.

$ hadoop fs -stat /formator$ hdfs dfs -stat /format

Truncate – Specified File Pattern and Length

Truncate all files that match the specified file pattern to the specified length.

$ hadoop fs -truncate [-w] /length /hdfs-file-pathor$ hdfs dfs -truncate [-w] /length /hdfs-file-path

Find – Find File Size in HDFS

In Hadoop, hdfs dfs -find or hadoop fs -find commands are used to get the size of a single file or size for all files specified in an expression or in a directory. By default, it points to the current directory when the path is not specified.

(Video) Top Hadoop HDFS Commands with Examples and Usage (Part - 2)

$hadoop fs -find / -name test -printor$hdfs dfs -find / -name test -print

Reference

Related Articles

  • Hadoop Copy Local File to HDFS – PUT Command
  • Hadoop Get File From HDFS to Local
  • Hadoop FS – How to List Files in HDFS
  • Hadoop Count Command – Returns HDFS File Size and File Counts
  • Hadoop – How To Get HDFS File Size(DU)
  • Hadoop “WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform” warning

You may also like reading:

  1. Hadoop FS – How to List Files in HDFS
  2. Hadoop Copy Local File to HDFS – PUT Command
  3. Hadoop Get File From HDFS to Local
  4. Hadoop Count Command – Returns HDFS File Size and File Counts
  5. Hadoop – How To Get HDFS File Size(DU)

Videos

1. Hadoop Tutorial: Hadoop Commands With Examples | Hadoop HDFS Commands Part-1 | OnlineLearningCenter
(OnlineLearningCenter)
2. Basic HDFS Commands : Replication Fector in HDFS | hdfs dfs -setrep | Hadoop commands with exmaple.
(MKD Mixture)
3. Hadoop Commands with examples
(BigData 101)
4. Basic HDFS Commands: hdfs dfs -ls in Details with all OPTIONS
(MKD Mixture)
5. Basic HDFS Commands: hdfs dfs getmerge -nl with all details | hadoop command getmerge with Example
(MKD Mixture)
6. HDFS Commands 1
(Arif Shaikh)
Top Articles
Latest Posts
Article information

Author: Neely Ledner

Last Updated: 05/09/2023

Views: 6107

Rating: 4.1 / 5 (42 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Neely Ledner

Birthday: 1998-06-09

Address: 443 Barrows Terrace, New Jodyberg, CO 57462-5329

Phone: +2433516856029

Job: Central Legal Facilitator

Hobby: Backpacking, Jogging, Magic, Driving, Macrame, Embroidery, Foraging

Introduction: My name is Neely Ledner, I am a bright, determined, beautiful, adventurous, adventurous, spotless, calm person who loves writing and wants to share my knowledge and understanding with you.