1:NameNode and DataNodes
HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.
HDFS是主从架构。一个HDFS集群包括:
一个NameNode:NameNode是一个master服务,用来管理文件系统的命名空间和提供客户端对文件的管理访问。
多个DataNodes:通常在一个集群中,一个节点对应一个DataNode,用来管理节点设计的附加在上边的存储。
HDFS暴露一个文件系统命名空间允许用户数据存储在文件系统命名空间的文件里。
在HDFS内部,一个文件被分为一个或者多个存储在一系列dataNode上的块。
NameNode执行文件系统命名空间的操作例如:打开、关闭和修改文件名称和目录。NameNode同样决定了数据块(block)和DataNode的映射关系。
DataNodes则负责服务文件系统客户端的读取和写入操作请求。
DataNOdes同样提供块创建、删除和基于NamdNode的块复制。
The NameNode and DataNode are pieces of software designed to run on commodity machines. These machines typically run a GNU/Linux operating system (OS). HDFS is built using the Java language; any machine that supports Java can run the NameNode or the DataNode software. Usage of the highly portable Java language means that HDFS can be deployed on a wide range of machines. A typical deployment has a dedicated machine that runs only the NameNode software. Each of the other machines in the cluster runs one instance of the DataNode software. The architecture does not preclude running multiple DataNodes on the same machine but in a real deployment that is rarely the case.
The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. The NameNode is the arbitrator and repository for all HDFS metadata. The system is designed in such a way that user data never flows through the NameNode.
集群中NameNode的存在是为了极大的简化系统架构。
一个NamdeNode是HDFS元数据的仲裁者和智囊团。系统被设计成用户数据从来不会流经NameNode。
(也就是说NameNode只是一个管理者而已,真正的数据读取写入都是针对DataNode 进行的)。
已经有了的中文翻译:
原文:
HBase文章:
hadoop部署说明文章:
开始部署完全分布式hbase:
机器准备:
虚拟机Wolfs.Master
虚拟机Wolfs.Node1
修改/etc/hosts: (*.*.*.*代表机器的ip地址)
*.*.*.86 Wolfs.Master
*.*.*.* Wolfs.Node1*.*.*.* Wolfs.Node2
三台机器的zookeeper的目录最好保持一致
1:搭建zookeeper集群
(1):修改配置文件zoo.cfg
新增:
server.1=Wolfs.Node1:2888:3888
server.2=Wolfs.Master:2888:3888
server.3=Wolfs.Node2:2888:3888
(2):zookeeper根目录下新建data目录
vi $zookeeper_home/data/myid
指定server.后边的数字,例如server.1则机器的$zookeeper_home/data/myid 文件内容为:1
对应的修改server.2的myid文件;
(3):执行三台机器的zkServer.sh 启动三台zookeeper;
启动后./zkServer.sh status 可以查看当前zookeeper为leader或者follower
zookeeper集群搭建参考:
2:搭建hadoop集群
参考:
我用的hadoop版本2.7.3,基本配置参考文章;
做一下ssh免密登录,配置下core-site.xml,yarn-site.xml,hadoop-env.sh 然后copy到Node1、Node2上,最后在Master上sbin/start-all.sh 就可以启动了。
3:hbase集群搭建
我用的hbase版本是1.2.3,配置相对简单;
1:修改conf/regionservers
# 增加以下两行
Wolfs.Node1
Wolfs.Node22:配置hdfs-site.xml,内容参照hadoop里hdfs-site.xml配置即可;
然后配置:hbase-site.xml
hbase.rootdir hdfs://Wolfs.Master:9000/hbase hbase.cluster.distributed true hbase.zookeeper.quorum Wolfs.Master,Wolfs.Node1,Wolfs.Node2 dfs.replication 1
3:配置hbase-env.sh
最后bin/start-hbase.sh 即可启动hbase;
至此,hbase集群环境搭建完成。