2 Comments

  1. Hi. I believe there’s a mistake in the description of corrupt blocks: “A block is called corrupt by HDFS if it has at least one corrupt replica along.” I think that all replicas must be corrupted to be marked as corrupt rather than at least one.

    JIRA HDFS-7281 explains the following:
    1. A block is missing if and only if all DNs of its expected replicas are dead.
    2. A block is corrupted if and only if all its available replicas are corrupted. So if a block has 3 replicas; one of the DN is dead, the other two replicas are corrupted; it will be marked as corrupted.

    1. Hi Jim, yes, the description is not accurate. I fixed it to make it clearer – “A block is “with corrupt replicas” in HDFS if it has at least one corrupt replica along with at least one live replica.”

      It refers to the count from the HDFS report.

Leave a Reply

Your email address will not be published. Required fields are marked *