The HDFS is a distributed, scalable, and portable filesystem written in Java for the Hadoop framework. Each node in a Hadoop instance typically has a single datanode; a cluster of datanodes form the HDFS cluster. The situation is typical because each node does not require a datanode to be present. Each datanode serves up blocks of data over the network using a block protocol specific to HDFS. The filesystem uses the TCP/IP layer for communication; clients use RPC to communicate between each other. The HDFS stores large files (an ideal file size is a multiple of 64 MB), across multiple machines. It achieves reliability by replicating the data across multiple hosts, and hence does not require RAID storage on hosts. With the default replication value, 3, data is stored on three nodes: two on the same rack, and one on a different rack. Data nodes can talk to each other to rebalance data, to move copies around, and to keep the replication of data high. HDFS is not fully POSIX compliant because the requirements for a POSIX filesystem differ from the target goals for a Hadoop application. The tradeoff of not having a fully POSIX compliant filesystem is increased performance for data throughput. The HDFS was designed to handle very large files.
The HDFS does not provide high availability, because an HDFS filesystem instance requires one unique server, the name node. This is a single point of failure for an HDFS installation. If the name node goes down, the filesystem is offline. When it comes back up, the name node must replay all outstanding operations. This replay process can take over half an hour for a big cluster.[10] The filesystem includes what is called a Secondary Namenode, which misleads some people into thinking that when the Primary Namenode goes offline, the Secondary Namenode takes over. In fact, the Secondary Namenode regularly connects with the Primary Namenode and builds snapshots of the Primary Namenode's directory information, which is then saved to local/remote directories. These checkpointed images can be used to restart a failed Primary Namenode without having to replay the entire journal of filesystem actions, the edit log to create an up-to-date directory structure.
An advantage of using the HDFS is data awareness between the jobtracker and tasktracker. The jobtracker schedules map/reduce jobs to tasktrackers with an awareness of the data location. An example of this would be if node A contained data (x,y,z) and node B contained data (a,b,c). The jobtracker will schedule node B to perform map/reduce tasks on (a,b,c) and node A would be scheduled to perform map/reduce tasks on (x,y,z). This reduces the amount of traffic that goes over the network and prevents unnecessary data transfer. When Hadoop is used with other filesystems this advantage is not always available. This can have a significant impact on the performance of job completion times, which has been demonstrated when running data intensive jobs.
Another limitation of HDFS is that it cannot be directly mounted by an existing operating system. Getting data into and out of the HDFS file system, an action that often needs to be performed before and after executing a job, can be inconvenient. A Filesystem in Userspace (FUSE) virtual file system has been developed to address this problem, at least for Linux and some other Unix systems.
File access can be achieved through the native Java API, the Thrift API to generate a client in the language of the users' choosing (C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml), the command-line interface, or browsed through the HDFS-UI webapp over HTTP.
Cloud Computing
Yahoo! Launches World's Largest Hadoop Production Application
Hadoop creator goes to Cloudera
The Hadoop Distributed File System: Architecture and Design
HADOOP-6704: add support for Parascale filesystem