The Hadoop Distributed File System (HDFS) has become an essential pillar in the world of big data processing. As an integral part of the Apache Hadoop ecosystem, understanding HDFS is key to harnessing the power of distributed storage. Whether you’re a newcomer to Hadoop or looking to test your foundational knowledge, this beginner-friendly quiz on HDFS is for you. Let’s begin!
1. What does HDFS stand for?
Answer:
Explanation:
HDFS stands for Hadoop Distributed File System. It's the primary storage system used by Hadoop applications.
2. In HDFS architecture, which component manages the metadata?
Answer:
Explanation:
In HDFS, the NameNode is responsible for storing and managing metadata, while actual data is stored in DataNodes.
3. Which default replication factor does HDFS use for data reliability?
Answer:
Explanation:
By default, HDFS replicates each block three times to ensure data reliability and fault tolerance.
4. The primary programming language used to develop HDFS is:
Answer:
Explanation:
HDFS, as a part of the Hadoop ecosystem, is primarily written in Java.
5. What is the default block size in HDFS (in Hadoop 2.x)?
Answer:
Explanation:
In Hadoop 2.x, the default block size for HDFS is 128 MB. This is larger than typical file systems to minimize the cost of seeks and to handle large datasets efficiently.
6. In the context of HDFS, what does 'Write once, Read many times' imply?
Answer:
Explanation:
This design principle means that in HDFS, data files are primarily immutable. This minimizes data coherency issues and optimizes for data retrieval operations.
7. What role does the Secondary NameNode play in HDFS?
Answer:
Explanation:
The Secondary NameNode periodically merges the changes (edits) with the filesystem image (fsimage) and creates a new fsimage. While it helps in creating checkpoints, it's not a failover for the primary NameNode.
8. DataNodes in HDFS periodically send which of the following to the NameNode?
Answer:
Explanation:
DataNodes send heartbeats to the NameNode to signal that they are operational. If the NameNode doesn't receive a heartbeat from a DataNode after a certain period, it marks the DataNode as unavailable.
9. Which of the following operations is NOT supported by HDFS?
Answer:
Explanation:
HDFS follows the 'Write once, Read many times' model, which means data files are immutable after creation. Hence, random writes to existing files are not supported.
10. HDFS is designed to work with how many hardware failures?
Answer:
Explanation:
HDFS is designed for fault tolerance. It can handle both hardware and software failures, ensuring data reliability and system availability.
11. What is a DataNode in HDFS?
Answer:
Explanation:
A DataNode in HDFS is responsible for storing the actual data blocks. DataNodes are the workhorses of HDFS, providing storage and data retrieval services.
12. Which tool can be used to import/export data from RDBMS to HDFS?
Answer:
Explanation:
Sqoop is a tool designed to transfer data between Hadoop and relational database systems. It facilitates the import and export of data between HDFS and RDBMS.
13. What is the replication factor in HDFS?
Answer:
Explanation:
The replication factor in HDFS refers to the number of copies of a data block that are stored. By default, this number is set to three, ensuring data reliability and fault tolerance.