700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > Hadoop集群部署-完全分布式

Hadoop集群部署-完全分布式

时间:2022-04-05 00:06:26

相关推荐

Hadoop集群部署-完全分布式

文章目录

一、概述二、架构三、部署1. 基础环境配置2. 创建hadoop用户并且生成密钥3. 配置三台服务器免密登录4. Zookeeper安装5. JDK与Hadoop安装6. 配置环境变量7. 启动Zookeeper8. 配置HDFS9. 启动journalnode10. master节点格式化11. 配置YARN12. Hadoop开启Histotryserver 四、问题解决1. Operation category READ is not supported in state standby2.找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster3. 查看IP:9870时,测试高可用发现都是Standby不会自动切换 五、可选配置六、常用命令

一、概述

版本信息

jdk-8u333

Hadoop-3.3.2

Zookeeper-3.5.7

Hadoop是一个用于处理大规模数据集的分布式计算框架,它的设计目标是能够在低成本硬件上进行可靠和高性能的数据存储和处理,从而使企业能够从大数据中获得价值。

ZooKeeper是一个开源的分布式协调服务,用于管理和协调大规模分布式系统中的配置、状态、命名等信息。

在Hadoop生态系统中,Hadoop分布式文件系统(HDFS)YARN资源管理器(Yet Another Resource Negotiator)是两个核心组件,分别用于存储管理资源。以下是关于HDFS和YARN中各个守护进程的解释:

HDFS守护进程:

NameNode(名称节点):NameNode是HDFS的中央管理节点,负责管理文件系统的命名空间和元数据。它维护着文件和目录的层次结构,以及每个文件的块信息和位置。NameNode记录了整个文件系统的元数据,但不存储实际数据内容。它是HDFS的单点故障,因此其高可用性变得尤为重要。在高可用配置中,有两个NameNode,一个处于活动状态,另一个处于备用状态。

SecondaryNameNode(辅助名称节点):SecondaryNameNode并不是NameNode的备用,它主要负责定期合并和处理NameNode的编辑日志,以减小NameNode的元数据编辑日志文件的大小,帮助提高集群的稳定性。SecondaryNameNode会定期合并这些日志文件,生成一个新的镜像文件,然后将这个新的镜像文件拷贝给NameNode,以减轻NameNode的负担。

DataNode(数据节点):DataNode是HDFS中实际存储数据块的节点。它负责存储数据块,响应客户端和NameNode的请求,执行数据块的复制和恢复操作,以及报告块的健康状态。DataNode在集群中分布,存储着实际的文件数据。

YARN守护进程:

ResourceManager(资源管理器):ResourceManager是YARN的中央管理节点,负责整个集群的资源管理和分配。它接受客户端和应用程序的资源请求,然后将集群资源分配给各个NodeManager。ResourceManager维护了集群的资源容量和使用情况,以及各个应用程序的资源分配情况。在高可用配置中,也可以有一个备用的ResourceManager。

NodeManager(节点管理器):NodeManager运行在集群中的每个节点上,负责管理单个节点上的资源和容器。它负责接收来自ResourceManager的资源分配请求,创建和管理容器(封装了应用程序的进程和资源),监控容器的运行状态,并报告节点的健康状况和资源利用情况给ResourceManager。

总结:

HDFS中的守护进程(NameNode、SecondaryNameNode和DataNode)主要负责文件系统的存储、元数据管理和数据块的存储。

YARN中的守护进程(ResourceManager和NodeManager)负责集群资源的管理和分配,以及应用程序的执行和监控

这两个组件共同构成了Hadoop集群的核心基础。

在Hadoop集群中,ZooKeeper是一个关键的分布式协调服务,用于管理和维护集群中各个节点之间的配置信息、状态信息以及分布式应用程序的协调操作。ZooKeeper提供了高度可靠的分布式数据存储和同步服务,帮助协调和管理分布式系统的各个组件。以下是ZooKeeper在Hadoop集群中的作用:

配置管理:ZooKeeper用于存储和管理Hadoop集群中各个组件的配置信息。这些配置信息可以包括各个节点的IP地址、端口号、角色信息等。通过ZooKeeper,集群中的不同节点可以共享和获取这些配置信息,从而实现统一的配置管理。

领导选举:在分布式系统中,有些组件需要选举一个主节点(或领导者),以确保系统的高可用性和故障恢复。ZooKeeper提供了一种分布式锁机制,可以用来实现这种领导选举过程。例如,在Hadoop的高可用配置中,ZooKeeper可以用来协调NameNode的主备切换。

集群状态监控:ZooKeeper允许各个节点将自己的状态信息注册到ZooKeeper中,其他节点可以监视这些状态变化。这对于监控集群的健康状态以及故障检测和恢复非常重要。

分布式锁:ZooKeeper提供了分布式锁的支持,允许分布式系统中的多个节点协调访问共享资源,防止资源冲突和并发问题。

分布式队列:ZooKeeper的有序节点特性可以用来实现分布式队列,用于协调多个节点之间的任务调度和消息传递。

总结:ZooKeeper在Hadoop集群中的作用是提供分布式协调和管理功能,帮助各个组件在分布式环境中保持一致性、可靠性和高可用性。它是Hadoop生态系统中不可或缺的重要组成部分。

主要配置文件有core、mapred、yarn、hdfs,其中每一个配置项都需要去了解,此文档已经将所有出现的配置项作了解释
core-site.xml: 作用:配置Hadoop核心属性,包括文件系统和I/O设置。典型配置项:包括Hadoop默认文件系统(fs.defaultFS)、临时目录(hadoop.tmp.dir)、IPC设置等。用途:设置与文件系统、I/O操作以及其他通用属性相关的配置。这个文件对Hadoop集群中的所有组件都是必需的。hdfs-site.xml: 作用:配置HDFS(分布式文件系统)属性。典型配置项:包括副本数量(dfs.replication)、块大小(dfs.blocksize)、NameNode数据目录(dfs.namenode.name.dir)等。用途:设置与HDFS文件存储、复制和管理相关的属性。这个文件在Hadoop集群中用于配置HDFS组件。mapred-site.xml: 作用:配置MapReduce属性。典型配置项:包括MapReduce框架的框架名称(mapreduce.framework.name)、任务调度器(mapreduce.jobtracker.address)等。用途:设置与MapReduce作业调度、执行和管理相关的属性。这个文件在旧版本的Hadoop中用于配置MapReduce组件,新版本可能已经合并到了yarn-site.xml中。yarn-site.xml: 作用:配置YARN(资源管理器)属性。典型配置项:包括ResourceManager的地址(yarn.resourcemanager.address)、节点管理器的地址(yarn.nodemanager.address)等。用途:设置与YARN资源调度、管理和作业执行相关的属性。这个文件用于配置YARN组件。

二、架构

HDFS 守护进程是 NameNode、SecondaryNameNode 和 DataNode。YARN 守护进程是 ResourceManager、NodeManager

集群主机规划

软件规划

hadoop3.0+ 最低支持Java8

节点用户规划

为了安全,在集群环境中不要使用root用户,自行创建相关的用户和用户组,需要为用户设置密码

软件目录规划

规划好软件目录和数据存放目录,便于管理与维护

三、部署

1. 基础环境配置

以下统一使用节点名称代替各服务器

有一些可选配置,在编写yarn和mapred配置文件时可以看一下,决定是否写上去

三节点中有很多重复的相同操作,为了更加详细,这里每一步都会有相应节点的操作,熟悉文档后可以在master完成操作后,直接使用scp进行传输

master

[root@bogon ~]# hostnamectl set-hostname master # 修改主机名[root@bogon ~]# bash[root@master ~]# cat << EOF >> /etc/hosts # 编写hosts文件10.10.11.235 master10.10.11.236 slave110.10.11.237 slave2EOF[root@master ~]# cat /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain610.10.11.235 master10.10.11.236 slave110.10.11.237 slave2[root@master ~]# systemctl stop firewalld[root@master ~]# systemctl disable firewalld[root@master ~]# setenforce 0[root@master ~]# sed -i 's/enforcing/disabled/g' /etc/selinux/config [root@master ~]# vi /etc/ssh/sshd_config /Pubkey # 搜索关键字,去掉注释,这个配置是允许使用密钥登录[root@master ~]# systemctl restart sshd[root@master ~]# yum -y install net-tools bash-completion vim lrzsz nc wget ntpdate psmisc

slave1

Last login: Tue Aug 8 11:20:41 from 10.10.30.138[root@bogon ~]# hostnamectl set-hostname slave1[root@bogon ~]# bash[root@slave1 ~]# cat << EOF >> /etc/hosts10.10.11.235 master10.10.11.236 slave110.10.11.237 slave2EOF[root@slave1 ~]# systemctl stop firewalld[root@slave1 ~]# systemctl disable firewalld[root@slave1 ~]# setenforce 0[root@slave1 ~]# sed -i 's/enforcing/disabled/g' /etc/selinux/config [root@slave1 ~]# vi /etc/ssh/sshd_config /Pubkey # 搜索关键字,去掉注释[root@slave1 ~]# systemctl restart sshd[root@slave1 ~]# yum -y install net-tools bash-completion vim lrzsz nc wget ntpdate

slave2

Last login: Tue Aug 8 11:20:44 from 10.10.30.138[root@localhost ~]# hostnamectl set-hostname slave2[root@localhost ~]# bash[root@slave2 ~]# cat << EOF >> /etc/hosts10.10.11.235 master10.10.11.236 slave110.10.11.237 slave2EOF[root@slave2 ~]# systemctl stop firewalld[root@slave2 ~]# systemctl disable firewalld[root@slave2 ~]# setenforce 0[root@slave2 ~]# sed -i 's/enforcing/disabled/g' /etc/selinux/config [root@slave2 ~]# vi /etc/ssh/sshd_config /Pubkey # 搜索关键字,去掉注释[root@slave2 ~]# systemctl restart sshd[root@slave2 ~]# yum -y install net-tools bash-completion vim lrzsz nc wget ntpdate

2. 创建hadoop用户并且生成密钥

master

[root@master ~]# mkdir -p /home/fosafer/hadoop[root@master ~]# useradd hadoop[root@master ~]# passwd hadoop # 密码: 更改用户 hadoop 的密码 。新的 密码:无效的密码: 密码包含用户名在某些地方重新输入新的 密码:passwd:所有的身份验证令牌已经成功更新。[root@master ~]# chown -R hadoop:hadoop /home/fosafer/hadoop[root@master ~]# su hadoop[hadoop@master root]$ cd [hadoop@master ~]$ ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): Created directory '/home/hadoop/.ssh'.Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hadoop/.ssh/id_rsa.Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.The key fingerprint is:SHA256:uwX3cj+cABRWj+ZFIxjS/Mapca5Gt4j9+OlwK9Kiaw8 hadoop@masterThe key's randomart image is:+---[RSA 2048]----+| .o+=o o || o= = . || . oo.o ||oo*. || S .B. || +o.+ ||E .=++++ . ||..+oB*.++ ||.o+o+.=* .. |+----[SHA256]-----+[hadoop@master ~]$ chmod 700 -R .ssh[hadoop@master ~]$ cd .ssh[hadoop@master .ssh]$ cat id_rsa.pub >> authorized_keys

slave1

[root@slave1 ~]# mkdir -p /home/fosafer/hadoop[root@slave1 ~]# useradd hadoop[root@slave1 ~]# passwd hadoop 更改用户 hadoop 的密码 。新的 密码:无效的密码: 密码包含用户名在某些地方重新输入新的 密码:passwd:所有的身份验证令牌已经成功更新。[root@slave1 ~]# chown -R hadoop:hadoop /home/fosafer/hadoop[root@slave1 ~]# su hadoop[hadoop@slave1 root]$ cd [hadoop@slave1 ~]$ ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): Created directory '/home/hadoop/.ssh'.Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hadoop/.ssh/id_rsa.Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.The key fingerprint is:SHA256:a1LcB9Ljjkobt8gmCY1j5FVnzo/X4pnrkPSfZYbqi/g hadoop@slave1The key's randomart image is:+---[RSA 2048]----+| ||. o. ||. =. +|| . . .o+ o|| o + Soo.. || * . o.*+... || . o .+ Boo+. + || oo.X ++o = ||o*.Eo*oo |+----[SHA256]-----+[hadoop@slave1 ~]$ chmod 700 -R .ssh[hadoop@slave1 ~]$ cd .ssh[hadoop@slave1 .ssh]$ cat id_rsa.pub >> authorized_keys

slave2

[root@slave2 ~]# mkdir -p /home/fosafer/hadoop[root@slave2 ~]# useradd hadoop[root@slave2 ~]# passwd hadoop 更改用户 hadoop 的密码 。新的 密码:无效的密码: 密码包含用户名在某些地方重新输入新的 密码:passwd:所有的身份验证令牌已经成功更新。[root@slave2 ~]# chown -R hadoop:hadoop /home/fosafer/hadoop[root@slave2 ~]# su hadoop[hadoop@slave2 root]$ cd [hadoop@slave2 ~]$ ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): Created directory '/home/hadoop/.ssh'.Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hadoop/.ssh/id_rsa.Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.The key fingerprint is:SHA256:t7NjluFyV1Ds1IczfVt7P5nfN4rvKcKLTP7mh6dNrA0 hadoop@slave2The key's randomart image is:+---[RSA 2048]----+| . + || B *|| + o*|| . .o.|| S . . =|| .o. .+.|| ..E+= . +||+ o=&*o ..+|| ++@B*+=. o|+----[SHA256]-----+[hadoop@slave2 ~]$ chmod 700 -R .ssh[hadoop@slave2 ~]$ cd .ssh[hadoop@slave2 .ssh]$ cat id_rsa.pub >> authorized_keys

3. 配置三台服务器免密登录

master

[hadoop@master .ssh]$ ssh-copy-id -i id_rsa.pub hadoop@slave1/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "id_rsa.pub"The authenticity of host 'slave1 (10.10.11.236)' can't be established.ECDSA key fingerprint is SHA256:uKAQhX4rSXHqZx7wc/Sh2OmHsRSDnlR9ruvJEqrms+E.ECDSA key fingerprint is MD5:6b:b1:50:72:89:91:9f:af:c5:2b:18:7c:21:c2:6a:08.Are you sure you want to continue connecting (yes/no)? yes/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keyshadoop@slave1's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh 'hadoop@slave1'"and check to make sure that only the key(s) you wanted were added.[hadoop@master .ssh]$ ssh-copy-id -i id_rsa.pub hadoop@slave2/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "id_rsa.pub"The authenticity of host 'slave2 (10.10.11.237)' can't be established.ECDSA key fingerprint is SHA256:PXM6aycRSAqCgOstlWWJ0xfWKlRuhe0daeW+Uw7mul0.ECDSA key fingerprint is MD5:5d:6e:46:4f:0c:21:3d:d0:5c:70:b2:47:90:ee:1b:e3.Are you sure you want to continue connecting (yes/no)? yes/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keyshadoop@slave2's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh 'hadoop@slave2'"and check to make sure that only the key(s) you wanted were added.

slave1

[hadoop@slave1 .ssh]$ ssh-copy-id -i id_rsa.pub hadoop@master/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "id_rsa.pub"The authenticity of host 'master (10.10.11.235)' can't be established.ECDSA key fingerprint is SHA256:pfF1t3LJkfWc6tBFNGY4JmM4ZGWgEu9+TNMeWg1FJ7o.ECDSA key fingerprint is MD5:53:c6:99:80:a8:9c:ac:6c:e5:d7:ed:dd:70:68:22:c5.Are you sure you want to continue connecting (yes/no)? yes/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keyshadoop@master's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh 'hadoop@master'"and check to make sure that only the key(s) you wanted were added.[hadoop@slave1 .ssh]$ ssh-copy-id -i id_rsa.pub hadoop@slave2/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "id_rsa.pub"The authenticity of host 'slave2 (10.10.11.237)' can't be established.ECDSA key fingerprint is SHA256:PXM6aycRSAqCgOstlWWJ0xfWKlRuhe0daeW+Uw7mul0.ECDSA key fingerprint is MD5:5d:6e:46:4f:0c:21:3d:d0:5c:70:b2:47:90:ee:1b:e3.Are you sure you want to continue connecting (yes/no)? yes/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keyshadoop@slave2's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh 'hadoop@slave2'"and check to make sure that only the key(s) you wanted were added.

slave2

[hadoop@slave2 .ssh]$ ssh-copy-id -i id_rsa.pub hadoop@master/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "id_rsa.pub"The authenticity of host 'master (10.10.11.235)' can't be established.ECDSA key fingerprint is SHA256:pfF1t3LJkfWc6tBFNGY4JmM4ZGWgEu9+TNMeWg1FJ7o.ECDSA key fingerprint is MD5:53:c6:99:80:a8:9c:ac:6c:e5:d7:ed:dd:70:68:22:c5.Are you sure you want to continue connecting (yes/no)? yes/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keyshadoop@master's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh 'hadoop@master'"and check to make sure that only the key(s) you wanted were added.[hadoop@slave2 .ssh]$ ssh-copy-id -i id_rsa.pub hadoop@slave1/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "id_rsa.pub"The authenticity of host 'slave1 (10.10.11.236)' can't be established.ECDSA key fingerprint is SHA256:uKAQhX4rSXHqZx7wc/Sh2OmHsRSDnlR9ruvJEqrms+E.ECDSA key fingerprint is MD5:6b:b1:50:72:89:91:9f:af:c5:2b:18:7c:21:c2:6a:08.Are you sure you want to continue connecting (yes/no)? yes/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keyshadoop@slave1's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh 'hadoop@slave1'"and check to make sure that only the key(s) you wanted were added.

测试用其他机器免密登录,没有问题即可

4. Zookeeper安装

三台服务器切换回root用户上传zookeeper的安装包

master

[root@master ~]# cd /home/fosafer/hadoop/[root@master hadoop]# ll总用量 0[root@master hadoop]# rzZMODEM Session started ------------------------ Sent apache-zookeeper-3.5.7-bin.tar.gz[root@master hadoop]# ll总用量 9096-rw-r--r--. 1 root root 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gz[root@master hadoop]# tar -vxf apache-zookeeper-3.5.7-bin.tar.gz[root@master hadoop]# mv apache-zookeeper-3.5.7-bin zookeeper-3.5.7[root@master hadoop]# ll总用量 9096-rw-r--r--. 1 root root 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 6 root root134 8月 8 15:11 zookeeper-3.5.7[root@master hadoop]# cd zookeeper-3.5.7/conf[root@master conf]# ll总用量 12-rw-r--r--. 1 502 games 535 5月 4 configuration.xsl-rw-r--r--. 1 502 games 2712 2月 7 log4j.properties-rw-r--r--. 1 502 games 922 2月 7 zoo_sample.cfg[root@master conf]# cp zoo_sample.cfg zoo.cfg[root@master conf]# vim zoo.cfg # 修改数据目录与日志目录# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial # synchronization phase can takeinitLimit=10# The number of ticks that can pass between # sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just # example sakes.dataDir=/home/fosafer/hadoop/zookeeper-3.5.7/zkdatadataLogDir=/home/fosafer/hadoop/zookeeper-3.5.7/zklog# the port at which the clients will connectclientPort=2181# the maximum number of client connections.# increase this if you need to handle more clients#maxClientCnxns=60## Be sure to read the maintenance section of the # administrator guide before turning on autopurge.## /doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge feature#autopurge.purgeInterval=1#1 2 3代表服务编号;2888代表Zookeeper节点通信端口;3888代表Zookeeper选举端口server.1=master:2888:3888server.2=slave1:2888:3888server.3=slave2:2888:3888#保存退出[root@master conf]# mkdir /home/fosafer/hadoop/zookeeper-3.5.7/zkdata[root@master conf]# mkdir /home/fosafer/hadoop/zookeeper-3.5.7/zklog[root@master conf]# echo 1 > /home/fosafer/hadoop/zookeeper-3.5.7/zkdata/myid[root@master conf]# cd /home/fosafer/[root@master fosafer]# ll总用量 0drwxr-xr-x. 3 hadoop hadoop 70 8月 8 15:11 hadoop[root@master fosafer]# chown hadoop:hadoop -R hadoop

slave1

[root@slave1 ~]# cd /home/fosafer/hadoop/[root@slave1 hadoop]# ll总用量 0[root@master hadoop]# rzZMODEM Session started ------------------------ Sent apache-zookeeper-3.5.7-bin.tar.gz[root@master hadoop]# ll总用量 9096-rw-r--r--. 1 root root 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gz[root@slave1 hadoop]# tar -vxf apache-zookeeper-3.5.7-bin.tar.gz[root@slave1 hadoop]# mv apache-zookeeper-3.5.7-bin zookeeper-3.5.7[root@slave1 hadoop]# ll总用量 9096-rw-r--r--. 1 root root 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 6 root root134 8月 8 15:11 zookeeper-3.5.7[root@master hadoop]# cd zookeeper-3.5.7/conf[root@master conf]# ll总用量 12-rw-r--r--. 1 502 games 535 5月 4 configuration.xsl-rw-r--r--. 1 502 games 2712 2月 7 log4j.properties-rw-r--r--. 1 502 games 922 2月 7 zoo_sample.cfg[root@slave1 conf]# cp zoo_sample.cfg zoo.cfg[root@slave1 conf]# vim zoo.cfg # The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial # synchronization phase can takeinitLimit=10# The number of ticks that can pass between # sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just # example sakes.dataDir=/home/fosafer/hadoop/zookeeper-3.5.7/zkdatadataLogDir=/home/fosafer/hadoop/zookeeper-3.5.7/zklog# the port at which the clients will connectclientPort=2181# the maximum number of client connections.# increase this if you need to handle more clients#maxClientCnxns=60## Be sure to read the maintenance section of the # administrator guide before turning on autopurge.## /doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge feature#autopurge.purgeInterval=1server.1=master:2888:3888server.2=slave1:2888:3888server.3=slave2:2888:3888[root@slave1 conf]# mkdir /home/fosafer/hadoop/zookeeper-3.5.7/zkdata[root@slave1 conf]# mkdir /home/fosafer/hadoop/zookeeper-3.5.7/zklog[root@slave1 conf]# echo 2 > /home/fosafer/hadoop/zookeeper-3.5.7/zkdata/myid[root@slave1 conf]# cd /home/fosafer/[root@slave1 fosafer]# ll总用量 0drwxr-xr-x. 3 hadoop hadoop 70 8月 8 15:19 hadoop[root@slave1 fosafer]# chown hadoop:hadoop -R hadoop

slave2

[root@slave2 ~]# cd /home/fosafer/hadoop/[root@slave2 hadoop]# ll总用量 0[root@master hadoop]# rzZMODEM Session started ------------------------ Sent apache-zookeeper-3.5.7-bin.tar.gz[root@master hadoop]# ll总用量 9096-rw-r--r--. 1 root root 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gz[root@slave2 hadoop]# tar -vxf apache-zookeeper-3.5.7-bin.tar.gz[root@slave2 hadoop]# mv apache-zookeeper-3.5.7-bin zookeeper-3.5.7[root@slave2 hadoop]# ll总用量 9096-rw-r--r--. 1 root root 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 6 root root134 8月 8 15:11 zookeeper-3.5.7[root@slave2 hadoop]# cd zookeeper-3.5.7/conf[root@slave2 conf]# ll总用量 12-rw-r--r--. 1 502 games 535 5月 4 configuration.xsl-rw-r--r--. 1 502 games 2712 2月 7 log4j.properties-rw-r--r--. 1 502 games 922 2月 7 zoo_sample.cfg[root@slave2 conf]# cp zoo_sample.cfg zoo.cfg[root@slave2 conf]# vim zoo.cfg # The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial # synchronization phase can takeinitLimit=10# The number of ticks that can pass between # sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just # example sakes.dataDir=/home/fosafer/hadoop/zookeeper-3.5.7/zkdatadataLogDir=/home/fosafer/hadoop/zookeeper-3.5.7/zklog# the port at which the clients will connectclientPort=2181# the maximum number of client connections.# increase this if you need to handle more clients#maxClientCnxns=60## Be sure to read the maintenance section of the # administrator guide before turning on autopurge.## /doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge feature#autopurge.purgeInterval=1server.1=master:2888:3888server.2=slave1:2888:3888server.3=slave2:2888:3888[root@slave2 conf]# mkdir /home/fosafer/hadoop/zookeeper-3.5.7/zkdata[root@slave2 conf]# mkdir /home/fosafer/hadoop/zookeeper-3.5.7/zklog[root@slave2 conf]# echo 3 > /home/fosafer/hadoop/zookeeper-3.5.7/zkdata/myid[root@slave2 conf]# cd /home/fosafer/[root@slave2 fosafer]# ll总用量 0drwxr-xr-x. 3 hadoop hadoop 70 8月 8 15:21 hadoop[root@slave2 fosafer]# chown hadoop:hadoop -R hadoop

5. JDK与Hadoop安装

三台服务器上传jdk包与hadoop包

master

[root@master ~]# cd /home/fosafer/hadoop/[root@master hadoop]# ll总用量 9096-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 8 hadoop hadoop161 8月 8 15:20 zookeeper-3.5.7[root@master hadoop]# rzZMODEM Session started ------------------------ Sent jdk-8u333-linux-x64.tar.gz[root@master hadoop]# tar -vxf jdk-8u333-linux-x64.tar.gz[root@master hadoop]# ll总用量 153632-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 8 10143 10143 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7[root@master hadoop]# chown hadoop:hadoop -R jdk1.8.0_333[root@master hadoop]# ll总用量 153632-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 8 hadoop hadoop 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7[root@master hadoop]# rzZMODEM Session started ------------------------ Sent hadoop-3.3.2.tar.gz[root@master hadoop]# tar -vxf hadoop-3.3.2.tar.gz[root@master hadoop]# ll总用量 777324-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 10 501 dialout 215 2月 22 hadoop-3.3.2-r--r--r--. 1 root root 638660563 8月 8 15:29 hadoop-3.3.2.tar.gzdrwxr-xr-x. 8 hadoop hadoop 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7[root@master hadoop]# chown hadoop:hadoop -R hadoop-3.3.2[root@master hadoop]# ll总用量 777324-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 10 hadoop hadoop 215 2月 22 hadoop-3.3.2-r--r--r--. 1 root root 638660563 8月 8 15:29 hadoop-3.3.2.tar.gzdrwxr-xr-x. 8 hadoop hadoop 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7

slave1

[root@slave1 ~]# cd /home/fosafer/hadoop/[root@slave1 hadoop]# ll总用量 9096-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 8 hadoop hadoop161 8月 8 15:20 zookeeper-3.5.7[root@master hadoop]# rzZMODEM Session started ------------------------ Sent jdk-8u333-linux-x64.tar.gz[root@slave1 hadoop]# tar -vxf jdk-8u333-linux-x64.tar.gz[root@slave1 hadoop]# ll总用量 153632-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 8 10143 10143 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7[root@slave1 hadoop]# chown hadoop:hadoop -R jdk1.8.0_333[root@slave1 hadoop]# ll总用量 153632-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 8 hadoop hadoop 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7[root@master hadoop]# rzZMODEM Session started ------------------------ Sent hadoop-3.3.2.tar.gz[root@slave1 hadoop]# tar -vxf hadoop-3.3.2.tar.gz[root@slave1 hadoop]# ll总用量 777324-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 10 501 dialout 215 2月 22 hadoop-3.3.2-r--r--r--. 1 root root 638660563 8月 8 15:29 hadoop-3.3.2.tar.gzdrwxr-xr-x. 8 hadoop hadoop 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7[root@slave1 hadoop]# chown hadoop:hadoop -R hadoop-3.3.2[root@slave1 hadoop]# ll总用量 777324-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 10 hadoop hadoop 215 2月 22 hadoop-3.3.2-r--r--r--. 1 root root 638660563 8月 8 15:29 hadoop-3.3.2.tar.gzdrwxr-xr-x. 8 hadoop hadoop 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7

slave2

[root@slave2 ~]# cd /home/fosafer/hadoop/[root@slave2 hadoop]# ll总用量 9096-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 8 hadoop hadoop161 8月 8 15:20 zookeeper-3.5.7[root@master hadoop]# rzZMODEM Session started ------------------------ Sent jdk-8u333-linux-x64.tar.gz[root@slave2 hadoop]# tar -vxf jdk-8u333-linux-x64.tar.gz[root@slave2 hadoop]# ll总用量 153632-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 8 10143 10143 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7[root@slave2 hadoop]# chown hadoop:hadoop -R jdk1.8.0_333[root@slave2 hadoop]# ll总用量 153632-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 8 hadoop hadoop 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7[root@slave2 hadoop]# rzZMODEM Session started ------------------------ Sent hadoop-3.3.2.tar.gz[root@slave2 hadoop]# tar -vxf hadoop-3.3.2.tar.gz[root@slave2 hadoop]# ll总用量 777324-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 10 501 dialout 215 2月 22 hadoop-3.3.2-r--r--r--. 1 root root 638660563 8月 8 15:29 hadoop-3.3.2.tar.gzdrwxr-xr-x. 8 hadoop hadoop 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7[root@slave2 hadoop]# chown hadoop:hadoop -R hadoop-3.3.2[root@slave2 hadoop]# ll总用量 777324-rw-r--r--. 1 hadoop hadoop 9311744 8月 8 15:10 apache-zookeeper-3.5.7-bin.tar.gzdrwxr-xr-x. 10 hadoop hadoop 215 2月 22 hadoop-3.3.2-r--r--r--. 1 root root 638660563 8月 8 15:29 hadoop-3.3.2.tar.gzdrwxr-xr-x. 8 hadoop hadoop 273 4月 26 jdk1.8.0_333-r--r--r--. 1 root root 148003421 8月 8 15:26 jdk-8u333-linux-x64.tar.gzdrwxr-xr-x. 8 hadoop hadoop 161 8月 8 15:20 zookeeper-3.5.7

6. 配置环境变量

三台服务器切换回 hadoop用户设置环境变量

master

[root@master hadoop]# su hadoop[hadoop@master hadoop]$ cd [hadoop@master ~]$ ll -a 总用量 16drwx------. 3 hadoop hadoop 95 8月 8 14:32 .drwxr-xr-x. 4 root root 35 8月 8 14:27 ..-rw-------. 1 hadoop hadoop 614 8月 8 14:59 .bash_history-rw-r--r--. 1 hadoop hadoop 18 4月 1 .bash_logout-rw-r--r--. 1 hadoop hadoop 193 4月 1 .bash_profile-rw-r--r--. 1 hadoop hadoop 231 4月 1 .bashrcdrwx------. 2 hadoop hadoop 80 8月 8 14:55 .ssh[hadoop@master ~]$ vim .bash_profile # 末尾增加# .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then. ~/.bashrcfi# User specific environment and startup programsPATH=$PATH:$HOME/.local/bin:$HOME/binexport PATHJAVA_HOME=/home/fosafer/hadoop/jdk1.8.0_333HADOOP_HOME=/home/fosafer/hadoop/hadoop-3.3.2ZOOKEEPER_HOME=/home/fosafer/hadoop/zookeeper-3.5.7PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbinexport JAVA_HOME HADOOP_HOME PATH# 保存退出[hadoop@master ~]$ source .bash_profile[root@master ~]# cd /home/fosafer/[root@master fosafer]# ll总用量 0drwxr-xr-x. 3 hadoop hadoop 70 8月 8 15:11 hadoop[root@master fosafer]# chown hadoop:hadoop -R hadoop

slave1

[root@slave1 hadoop]# su hadoop[hadoop@slave1 hadoop]$ cd [hadoop@slave1 ~]$ ll -a 总用量 16drwx------. 3 hadoop hadoop 95 8月 8 14:32 .drwxr-xr-x. 4 root root 35 8月 8 14:27 ..-rw-------. 1 hadoop hadoop 614 8月 8 14:59 .bash_history-rw-r--r--. 1 hadoop hadoop 18 4月 1 .bash_logout-rw-r--r--. 1 hadoop hadoop 193 4月 1 .bash_profile-rw-r--r--. 1 hadoop hadoop 231 4月 1 .bashrcdrwx------. 2 hadoop hadoop 80 8月 8 14:55 .ssh[hadoop@slave1 ~]$ vim .bash_profile # 末尾增加# .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then. ~/.bashrcfi# User specific environment and startup programsPATH=$PATH:$HOME/.local/bin:$HOME/binexport PATHJAVA_HOME=/home/fosafer/hadoop/jdk1.8.0_333HADOOP_HOME=/home/fosafer/hadoop/hadoop-3.3.2ZOOKEEPER_HOME=/home/fosafer/hadoop/zookeeper-3.5.7PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbinexport JAVA_HOME HADOOP_HOME PATH# 保存退出[hadoop@slave1 ~]$ source .bash_profile[root@slave1 ~]# cd /home/fosafer/[root@slave1 fosafer]# ll总用量 0drwxr-xr-x. 3 hadoop hadoop 70 8月 8 15:11 hadoop[root@slave1 fosafer]# chown hadoop:hadoop -R hadoop

slave2

[root@slave2 hadoop]# su hadoop[hadoop@slave2 hadoop]$ cd [hadoop@slave2 ~]$ ll -a 总用量 16drwx------. 3 hadoop hadoop 95 8月 8 14:32 .drwxr-xr-x. 4 root root 35 8月 8 14:27 ..-rw-------. 1 hadoop hadoop 614 8月 8 14:59 .bash_history-rw-r--r--. 1 hadoop hadoop 18 4月 1 .bash_logout-rw-r--r--. 1 hadoop hadoop 193 4月 1 .bash_profile-rw-r--r--. 1 hadoop hadoop 231 4月 1 .bashrcdrwx------. 2 hadoop hadoop 80 8月 8 14:55 .ssh[hadoop@slave2 ~]$ vim .bash_profile # 末尾增加# .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then. ~/.bashrcfi# User specific environment and startup programsPATH=$PATH:$HOME/.local/bin:$HOME/binexport PATHJAVA_HOME=/home/fosafer/hadoop/jdk1.8.0_333HADOOP_HOME=/home/fosafer/hadoop/hadoop-3.3.2ZOOKEEPER_HOME=/home/fosafer/hadoop/zookeeper-3.5.7PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbinexport JAVA_HOME HADOOP_HOME PATH# 保存退出[hadoop@slave2 ~]$ source .bash_profile[root@slave2 ~]# cd /home/fosafer/[root@slave2 fosafer]# ll总用量 0drwxr-xr-x. 3 hadoop hadoop 70 8月 8 15:11 hadoop[root@slave2 fosafer]# chown hadoop:hadoop -R hadoop

7. 启动Zookeeper

三台机器使用hadoop用户启动zookeeper

master正常来说主节点现在是leader

[hadoop@master ~]$ zkServer.sh startZooKeeper JMX enabled by defaultUsing config: /home/fosafer/hadoop/zookeeper-3.5.7/bin/../conf/zoo.cfgStarting zookeeper ... STARTED[hadoop@master ~]$ zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /home/fosafer/hadoop/zookeeper-3.5.7/bin/../conf/zoo.cfgClient port found: 2181. Client address: localhost.Mode: leader

slave1

[hadoop@slave1 ~]$ zkServer.sh startZooKeeper JMX enabled by defaultUsing config: /home/fosafer/hadoop/zookeeper-3.5.7/bin/../conf/zoo.cfgStarting zookeeper ... STARTED[hadoop@slave1 ~]$ zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /home/fosafer/hadoop/zookeeper-3.5.7/bin/../conf/zoo.cfgClient port found: 2181. Client address: localhost.Mode: follower

slave2

[hadoop@slave2 ~]$ zkServer.sh startZooKeeper JMX enabled by defaultUsing config: /home/fosafer/hadoop/zookeeper-3.5.7/bin/../conf/zoo.cfgStarting zookeeper ... STARTED[hadoop@slave2 ~]$ zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /home/fosafer/hadoop/zookeeper-3.5.7/bin/../conf/zoo.cfgClient port found: 2181. Client address: localhost.Mode: follower

8. 配置HDFS

还是使用hadoop用户操作

master

# 注意,这里第二个路径不用加/,加了/会导致生成日志时变为//logs[hadoop@master ~]$ cat <<EOF>> /home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop/hadoop-env.sh export JAVA_HOME=/home/fosafer/hadoop/jdk1.8.0_333/export HADOOP_HOME=/home/fosafer/hadoop/hadoop-3.3.2EOF

[hadoop@master ~]$ vim /home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop/core-site.xml <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration><property><name>hadoop.http.staticuser.user</name><value>hadoop</value></property><property><name>fs.defaultFS</name><value>hdfs://mycluster</value></property><property><name>hadoop.tmp.dir</name><value>/home/hadoop/data/tmp</value></property><property><name>ha.zookeeper.quorum</name><value>master:2181,slave1:2181,slave2:2181</value></property></configuration>

注释:

hadoop.http.staticuser.user:此属性允许定义一个特定的用户名,用于在Hadoop Web界面上访问静态资源时进行访问控制。

fs.defaultFS:此属性是一个关键的Hadoop配置项,它定义了默认的文件系统URI,为Hadoop集群中的文件操作提供了基准路径。不同的值将会影响到Hadoop集群中文件操作的行为和目标文件系统。设置为hdfs://mycluster可以确保Hadoop集群中的各个组件默认使用指定的HDFS命名服务作为默认的文件系统。这样的设置有助于简化配置,并且可以提供一致的文件操作体验。

hadoop.tmp.dir:用于指定临时文件和目录的基准路径。在Hadoop的运行过程中,许多临时数据和中间结果会被写入到磁盘,而hadoop.tmp.dir属性就定义了这些临时数据所存储的位置。在分布式环境中,每个节点上都应该有一个对应的hadoop.tmp.dir目录。

ha.zookeeper.quorum:此属性在Hadoop的高可用配置中用于指定ZooKeeper集群的连接信息,以支持关键HA组件的状态管理和故障切换,从而提供集群的高可用性保障。其中masterslave1slave2是ZooKeeper节点的主机名,2181是ZooKeeper的默认端口号。

[hadoop@master ~]$ vim /home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop/hdfs-site.xml<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration><property><name>dfs.nameservices</name><value>mycluster</value></property><property><name>dfs.permissions.enabled</name><value>false</value></property><property><name>dfs.ha.namenodes.mycluster</name><value>nn1,nn2,nn3</value></property><property><name>dfs.namenode.rpc-address.mycluster.nn1</name><value>master:9820</value></property><property><name>dfs.namenode.rpc-address.mycluster.nn2</name><value>slave1:9820</value></property><property><name>dfs.namenode.rpc-address.mycluster.nn3</name><value>slave2:9820</value></property><property><name>dfs.namenode.http-address.mycluster.nn1</name><value>master:9870</value></property><property><name>dfs.namenode.http-address.mycluster.nn2</name><value>slave1:9870</value></property><property><name>dfs.namenode.http-address.mycluster.nn3</name><value>slave2:9870</value></property><property><name>dfs.ha.automatic-failover.enabled</name><value>true</value></property><property><name>dfs.namenode.shared.edits.dir</name><value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value></property><property><name>dfs.journalnode.edits.dir</name><value>/home/hadoop/data/journaldata/jn</value></property><property><name>dfs.client.failover.proxy.provider.mycluster</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property><property><name>dfs.ha.fencing.methods</name><value>sshfence</value></property><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/home/hadoop/.ssh/id_rsa</value></property><property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>10000</value></property><property><name>dfs.namenode.handler.count</name><value>100</value></property><property><name>dfs.ha.automatic-failover.enabled</name><value>true</value></property><property><name>dfs.webhdfs.enabled</name><value>true</value></property><!--修改core-site.xml中的ipc参数,防止出现连接journalnode服务ConnectException--><property><name>ipc.client.connect.max.retries</name><value>100</value><description>Indicates the number of retries a client will make to establish a server connection.</description></property><property><name>ipc.client.connect.retry.interval</name><value>10000</value><description>Indicates the number of milliseconds a client will wait for before retrying to establish a server connection.</description></property></configuration>

注释:

<property><name>dfs.nameservices</name>...</property>:指定HDFS的命名服务名称,这里是mycluster

<property><name>dfs.permissions.enabled</name>...</property>:设置是否启用HDFS的权限管理。在此配置中,权限管理被禁用(false),这意味着文件和目录的权限检查将被跳过。

<property><name>dfs.ha.namenodes.mycluster</name>...</property>:定义了HA集群中的各个NameNode的名称。在这里,有三个NameNode:nn1nn2nn3

<property><name>dfs.namenode.rpc-address.mycluster.nn1</name>...</property>:定义nn1节点的RPC地址,格式为hostname:port

<property><name>dfs.namenode.http-address.mycluster.nn1</name>...</property>:定义nn1节点的HTTP地址。

<property><name>dfs.ha.automatic-failover.enabled</name>...</property>:启用自动故障切换,如果一个NameNode失败,系统将自动切换到备用的NameNode。

<property><name>dfs.namenode.shared.edits.dir</name>...</property>:定义了共享编辑日志的路径,这里使用了Quorum Journal Manager(QJM)来实现共享编辑日志。

<property><name>dfs.journalnode.edits.dir</name>...</property>:指定JournalNode的编辑日志存储目录。

<property><name>dfs.client.failover.proxy.provider.mycluster</name>...</property>:配置用于客户端故障转移代理提供程序。

<property><name>dfs.ha.fencing.methods</name>...</property>:配置用于Fencing(隔离失败节点)的方法。这里使用SSH(sshfence)作为Fencing方法。

<property><name>dfs.ha.fencing.ssh.private-key-files</name>...</property>:指定SSH私钥文件的路径,用于Fencing。

<property><name>dfs.namenode.handler.count</name>...</property>:设置每个NameNode处理线程的数量。

<property><name>dfs.webhdfs.enabled</name>...</property>:启用WebHDFS服务。

<property><name>ipc.client.connect.max.retries</name>...</property>:设置IPC客户端连接最大重试次数,防止连接JournalNode时出现连接问题。

<property><name>ipc.client.connect.retry.interval</name>...</property>:设置IPC客户端连接重试间隔,以等待重试连接到JournalNode。

此配置片段的目标是在HDFS中启用高可用性,以确保在某个NameNode出现故障时能够自动切换到备用的NameNode,从而保障HDFS的持续可用性。这些配置用于配置主备NameNode节点以及相关的故障切换、共享编辑日志等功能。

[hadoop@master ~]$ cat <<EOF>/home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop/workersmasterslave1slave2EOF[hadoop@master ~]$ scp -r /home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop/* hadoop@slave1:/home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop/[hadoop@master ~]$ scp -r /home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop/* hadoop@slave2:/home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop/

9. 启动journalnode

使用hadoop用户启动

JournalNode是Hadoop HDFS(分布式文件系统)中的一个组件,用于提供高可用性(HA)配置下的共享编辑日志功能。在Hadoop HA配置中,为了确保主备NameNode的状态同步和故障切换,需要共享的编辑日志。JournalNode负责存储这些共享的编辑日志,以支持主备NameNode之间的状态同步和故障切换。

以下是JournalNode的主要作用:

共享编辑日志存储:在HDFS的HA配置中,每个NameNode都会生成并保留一份编辑日志,这些日志记录了HDFS中文件系统操作的修改。这些编辑日志需要被同步到所有的NameNode,以保证它们的状态一致。JournalNode负责存储这些编辑日志的副本,以便备用的NameNode可以获取并应用这些操作,从而保持与主NameNode的状态一致。故障切换支持:当主NameNode出现故障或需要切换时,备用NameNode会从JournalNode获取最新的编辑日志,并将其应用以恢复到与主NameNode一致的状态。这使得故障切换能够更快地完成,并减少数据丢失的可能性。容错性:JournalNode使用一种分布式的日志存储机制,确保了编辑日志的冗余性和可靠性。如果某个JournalNode节点出现故障,其他正常的JournalNode节点仍然可以提供服务,确保数据的容错性。

值得注意的是,JournalNode并不是HDFS必需的组件,它只在HDFS的高可用配置中才会使用。在非高可用配置中,HDFS没有JournalNode,而是直接在本地文件系统中存储编辑日志。JournalNode的引入增加了HDFS的可用性,允许主备NameNode之间的状态同步和故障切换更加可靠和高效。

master

[hadoop@master ~]$ hdfs --daemon start journalnodeWARNING: /home/fosafer/hadoop/hadoop-3.3.2/logs does not exist. Creating.

slave1

[hadoop@slave1 ~]$ hdfs --daemon start journalnodeWARNING: /home/fosafer/hadoop/hadoop-3.3.2/logs does not exist. Creating.

slave2

[hadoop@slave2 ~]$ hdfs --daemon start journalnodeWARNING: /home/fosafer/hadoop/hadoop-3.3.2/logs does not exist. Creating.

10. master节点格式化

master

[hadoop@master ~]$ hdfs namenode -format #初始化 HDFS 的 NameNode ......-08-08 17:40:32,992 INFO namenode.FSImage: Allocated new BlockPoolId: BP-619171587-10.10.11.235-1691487632992-08-08 17:40:33,008 INFO common.Storage: Storage directory /home/hadoop/data/tmp/dfs/name has been successfully formatted.-08-08 17:40:33,104 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/data/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression-08-08 17:40:33,184 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/data/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .-08-08 17:40:33,194 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0-08-08 17:40:33,250 INFO namenode.FSNamesystem: Stopping services started for active state-08-08 17:40:33,250 INFO namenode.FSNamesystem: Stopping services started for standby state-08-08 17:40:33,253 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.-08-08 17:40:33,253 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at master/10.10.11.235************************************************************/[hadoop@master ~]$ hdfs zkfc -formatZK......-08-08 17:41:09,709 INFO common.X509Util: Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation-08-08 17:41:09,715 INFO zookeeper.ClientCnxnSocket: jute.maxbuffer value is 4194304 Bytes-08-08 17:41:09,720 INFO zookeeper.ClientCnxn: zookeeper.request.timeout value is 0. feature enabled=-08-08 17:41:09,731 INFO zookeeper.ClientCnxn: Opening socket connection to server master/10.10.11.235:2181. Will not attempt to authenticate using SASL (unknown error)-08-08 17:41:09,739 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.10.11.235:54084, server: master/10.10.11.235:2181-08-08 17:41:09,778 INFO zookeeper.ClientCnxn: Session establishment complete on server master/10.10.11.235:2181, sessionid = 0x100051f53590000, negotiated timeout = 10000-08-08 17:41:09,781 INFO ha.ActiveStandbyElector: Session connected.-08-08 17:41:09,807 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.-08-08 17:41:09,913 INFO zookeeper.ZooKeeper: Session: 0x100051f53590000 closed-08-08 17:41:09,913 WARN ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x100051f53590000-08-08 17:41:09,913 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x100051f53590000-08-08 17:41:09,916 INFO tools.DFSZKFailoverController: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down DFSZKFailoverController at master/10.10.11.235************************************************************/[hadoop@master ~]$ hdfs namenode......-08-08 17:41:44,380 INFO ipc.Server: IPC Server listener on 9820: starting-08-08 17:41:44,394 INFO namenode.NameNode: NameNode RPC up at: master/10.10.11.235:9823-08-08 17:41:44,396 INFO namenode.FSNamesystem: Starting services required for standby state-08-08 17:41:44,400 INFO ha.EditLogTailer: Will roll logs on active node every 120 seconds.-08-08 17:41:44,406 INFO ha.StandbyCheckpointer: Starting standby checkpoint thread...Checkpointing active NN to possible NNs: [http://slave1:9870, http://slave2:9870]Serving checkpoints at http://master:9870#此时不要退出次进程

master命令执行创建namenode后,开始执行两个slave的操作

slave1

[hadoop@slave1 ~]$ hdfs namenode -bootstrapStandby......-08-08 17:43:24,266 INFO common.Util: Combined time for file download and fsync to all disks took 0.00s. The file download took 0.00s at 0.00 KB/s. Synchronous (fsync) write to disk of /home/hadoop/data/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 took 0.00s.-08-08 17:43:24,266 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 401 bytes.-08-08 17:43:24,276 INFO ha.BootstrapStandby: Skipping InMemoryAliasMap bootstrap as it was not configured-08-08 17:43:24,281 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at slave1/10.10.11.236************************************************************/

slave2

[hadoop@slave1 ~]$ hdfs namenode -bootstrapStandby......-08-08 17:43:31,919 INFO common.Util: Combined time for file download and fsync to all disks took 0.00s. The file download took 0.00s at 0.00 KB/s. Synchronous (fsync) write to disk of /home/hadoop/data/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 took 0.00s.-08-08 17:43:31,919 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 401 bytes.-08-08 17:43:31,925 INFO ha.BootstrapStandby: Skipping InMemoryAliasMap bootstrap as it was not configured-08-08 17:43:31,928 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at slave2/10.10.11.237************************************************************/

master,此时查看master日志,会输出同步信息

......-08-08 17:41:44,380 INFO ipc.Server: IPC Server listener on 9820: starting-08-08 17:41:44,394 INFO namenode.NameNode: NameNode RPC up at: master/10.10.11.235:9823-08-08 17:41:44,396 INFO namenode.FSNamesystem: Starting services required for standby state-08-08 17:41:44,400 INFO ha.EditLogTailer: Will roll logs on active node every 120 seconds.-08-08 17:41:44,406 INFO ha.StandbyCheckpointer: Starting standby checkpoint thread...Checkpointing active NN to possible NNs: [http://slave1:9870, http://slave2:9870]Serving checkpoints at http://master:9870-08-08 17:43:24,256 INFO namenode.TransferFsImage: Sending fileName: /home/hadoop/data/tmp/dfs/name/current/fsimage_0000000000000000000, fileSize: 401. Sent total: 401 bytes. Size of last segment intended to send: -1 bytes.-08-08 17:43:31,912 INFO namenode.TransferFsImage: Sending fileName: /home/hadoop/data/tmp/dfs/name/current/fsimage_0000000000000000000, fileSize: 401. Sent total: 401 bytes. Size of last segment intended to send: -1 bytes.-08-08 17:43:44,425 INFO ha.EditLogTailer: Triggering log roll on remote NameNode-08-08 17:43:54,486 INFO ipc.Client: Retrying connect to server: slave1/10.10.11.236:9820. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)-08-08 17:44:04,488 INFO ipc.Client: Retrying connect to server: slave1/10.10.11.236:9820. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)-08-08 17:44:14,490 INFO ipc.Client: Retrying connect to server: slave1/10.10.11.236:9820. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)-08-08 17:44:24,492 INFO ipc.Client: Retrying connect to server: slave1/10.10.11.236:9820. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)-08-08 17:44:34,494 INFO ipc.Client: Retrying connect to server: slave1/10.10.11.236:9820. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=100, sleepTime=10000 MILLISECONDS)# 同步结束后 ctrl+c退出进程

同步结束后,关闭所有节点的journalnode

all

hdfs --daemon stop journalnode

关闭后,所有节点启动HDFS

all,有的可能启动会报错,重启一下即可

[hadoop@master ~]$ start-dfs.shStarting namenodes on [master slave1 slave2]master: Warning: Permanently added 'master,10.10.11.235' (ECDSA) to the list of known hosts.Starting datanodesStarting journal nodes [slave2 slave1 master]Starting ZK Failover Controllers on NN hosts [master slave1 slave2][hadoop@master ~]$

[hadoop@slave1 ~]$ start-dfs.shStarting namenodes on [master slave1 slave2]slave1: Warning: Permanently added 'slave1,10.10.11.236' (ECDSA) to the list of known hosts.slave2: namenode is running as process 19387. Stop it first and ensure /tmp/hadoop-hadoop-namenode.pid file is empty before retry.master: namenode is running as process 15656. Stop it first and ensure /tmp/hadoop-hadoop-namenode.pid file is empty before retry.slave1: namenode is running as process 21955. Stop it first and ensure /tmp/hadoop-hadoop-namenode.pid file is empty before retry.Starting datanodesStarting journal nodes [slave2 slave1 master]Starting ZK Failover Controllers on NN hosts [master slave1 slave2]

[hadoop@slave2 ~]$ start-dfs.shStarting namenodes on [master slave1 slave2]slave2: Warning: Permanently added 'slave2,10.10.11.237' (ECDSA) to the list of known hosts.slave1: Warning: Permanently added 'slave1,10.10.11.236' (ECDSA) to the list of known hosts.master: Warning: Permanently added 'master,10.10.11.235' (ECDSA) to the list of known hosts.slave2: namenode is running as process 19387. Stop it first and ensure /tmp/hadoop-hadoop-namenode.pid file is empty before retry.master: namenode is running as process 15656. Stop it first and ensure /tmp/hadoop-hadoop-namenode.pid file is empty before retry.slave1: namenode is running as process 21955. Stop it first and ensure /tmp/hadoop-hadoop-namenode.pid file is empty before retry.Starting datanodesStarting journal nodes [slave2 slave1 master]Starting ZK Failover Controllers on NN hosts [master slave1 slave2]

启动成功后访问页面IP:9870

注意:三个节点都是NameNode

此时测试集群高可用,关闭active状态的节点,按照上述图片,正在活动的节点是slave1

[hadoop@slave1 ~]$ hdfs --daemon stop namenode

关闭后查看状态,active状态成功转移

master

切换为hadoop用户测试上传文件

[root@master ~]# su hadoop[hadoop@master root]$ cd [hadoop@master ~]$ ll总用量 0drwxrwxr-x. 4 hadoop hadoop 36 8月 8 17:40 data[hadoop@master ~]$ cat <<EOF> text.txt> 123456> asdfg> qwert> EOF[hadoop@master ~]$ cat text.txt 123456asdfgqwert[hadoop@master ~]$ source .bash_profile # 先执行一下环境变量[hadoop@master ~]$ hdhdfs hdfs.cmd hdsploader [hadoop@master ~]$ hdfs dfs -mkdir -p /fosafer/test[hadoop@master ~]$ hdfs dfs -put text.txt /fosafer/test[hadoop@master ~]$ hdfs dfs -cat /fosafer/test/text.txt # 成功上传123456asdfgqwert

访问页面查看IP:9870

输入路径,查看路径下创建的文件

11. 配置YARN

master

切换hadoop用户进行操作

[hadoop@master ~]# cd /home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop/[hadoop@master hadoop]# vim mapred-site.xml <?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.application.classpath</name><value>/home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/common/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/common/lib/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/hdfs/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/lib/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/yarn/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/*</value></property></configuration>

注释:

mapreduce.application.classpath:在 Hadoop MapReduce 中用于配置作业运行时的类路径,确保作业能够访问所需的类和依赖项。这对于处理作业的依赖和隔离作业环境非常重要。

[hadoop@master hadoop]# cat yarn-site.xml <?xml version="1.0"?><!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.--><configuration><property><name>yarn.resourcemanager.connect.retry-interval.ms</name><value>2000</value></property><property><name>yarn.resourcemanager.ha.enabled</name><value>true</value></property><property><name>yarn.resourcemanager.ha.automatic-failover.enabled</name><value>true</value></property><property><name>yarn.resourcemanager.ha.automatic-failover.embedded</name><value>true</value></property><property><name>yarn.resourcemanager.cluster-id</name><value>yarn-rm-cluster</value></property><property><name>yarn.resourcemanager.ha.rm-ids</name><value>rm1,rm2</value></property><property><name>yarn.resourcemanager.hostname.rm1</name><value>master</value></property><property><name>yarn.resourcemanager.hostname.rm2</name><value>slave1</value></property><property><name>yarn.resourcemanager.recovery.enabled</name><value>true</value></property><property><description>The class to use as the persistent store.</description><name>yarn.resourcemanager.store.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value></property><property><name>yarn.resourcemanager.zk.state-store.address</name><value>master:2181,slave1:2181,slave2:2181</value></property><property><name>yarn.resourcemanager.zk-address</name><value>master:2181,slave1:2181,slave2:2181</value></property><property><name>yarn.resourcemanager.address.rm1</name><value>master:8032</value></property><property><name>yarn.resourcemanager.scheduler.address.rm1</name><value>master:8034</value></property><property><name>yarn.resourcemanager.webapp.address.rm1</name><value>master:8088</value></property><property><name>yarn.resourcemanager.address.rm2</name><value>slave1:8032</value></property><property><name>yarn.resourcemanager.scheduler.address.rm2</name><value>slave1:8034</value></property><property><name>yarn.resourcemanager.webapp.address.rm2</name><value>slave1:8088</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property></configuration>

注释:

yarn.resourcemanager.connect.retry-interval.ms:指定ResourceManager之间连接重试的间隔时间(以毫秒为单位)。

yarn.resourcemanager.ha.enabled:启用ResourceManager的高可用性配置。

yarn.resourcemanager.ha.automatic-failover.enabled:启用自动故障切换。

yarn.resourcemanager.ha.automatic-failover.embedded:启用内嵌的自动故障切换。

yarn.resourcemanager.cluster-id:指定ResourceManager集群的唯一标识符。

yarn.resourcemanager.ha.rm-ids:指定ResourceManager的标识符列表,用于HA配置。

yarn.resourcemanager.hostname.rm1yarn.resourcemanager.hostname.rm2:指定ResourceManager的主机名或IP地址,以及相应的标识符。

yarn.resourcemanager.recovery.enabled:启用ResourceManager的恢复机制。

yarn.resourcemanager.store.class:指定持久化状态存储类,这里使用了ZooKeeper作为状态存储。

yarn.resourcemanager.zk.state-store.addressyarn.resourcemanager.zk-address:指定ZooKeeper的地址,用于HA状态存储。

yarn.resourcemanager.address.rm1yarn.resourcemanager.address.rm2:指定ResourceManager的地址和端口号。

yarn.resourcemanager.scheduler.address.rm1yarn.resourcemanager.scheduler.address.rm2:指定ResourceManager调度器的地址和端口号。

yarn.resourcemanager.webapp.address.rm1yarn.resourcemanager.webapp.address.rm2:指定ResourceManager Web界面的地址和端口号。

yarn.nodemanager.aux-services:指定附加的NodeManager服务,这里是MapReduce Shuffle服务。

yarn.nodemanager.aux-services.mapreduce_shuffle.class:指定MapReduce Shuffle服务的实现类。

切换回hadoop用户进行传输

[hadoop@master hadoop]$ scp mapred-site.xml yarn-site.xml hadoop@slave1:$PWD/mapred-site.xml 100% 1589 856.2KB/s 00:00 yarn-site.xml 100% 30031.8MB/s 00:00 [hadoop@master hadoop]$ scp mapred-site.xml yarn-site.xml hadoop@slave2:$PWD/mapred-site.xml 100% 1589 817.1KB/s 00:00 yarn-site.xml 100% 30031.4MB/s 00:00

启动resourcemanager与nodemanager

[hadoop@master hadoop]$ cd [hadoop@master ~]$ source .bash_profile [hadoop@master ~]$ yarn --daemon start resourcemanager[hadoop@master ~]$ yarn --daemon start nodemanager

slave1

切换hadoop用户,启动resourcemanager与nodemanager

[root@slave1 ~]# su hadoop[hadoop@slave1 root]$ cd[hadoop@slave1 ~]$ source .bash_profile [hadoop@slave1 ~]$ yarn --daemon start resourcemanager[hadoop@slave1 ~]$ yarn --daemon start nodemanager

slave2

切换hadoop用户,启动nodemanager

[root@slave2 ~]# su hadoop[hadoop@slave2 root]$ cd[hadoop@slave2 ~]$ source .bash_profile [hadoop@slave2 ~]$ yarn --daemon start nodemanager

测试访问

IP:8088/cluster,点击3

master

检测ResourceManager状态

[hadoop@master ~]$ yarn rmadmin -getServiceState rm1active [hadoop@master ~]$ yarn rmadmin -getServiceState rm2standby# active为活动,standby为备用#注意:rm1是master,rm2是slave1

下面关闭master主机的ResourceManager,检测active状态会不会转移

[hadoop@master ~]$ yarn --daemon stop resourcemanager[hadoop@master ~]$ yarn rmadmin -getServiceState rm2 # 关闭后,发现活动状态成功转移active[hadoop@master ~]$ yarn rmadmin -getServiceState rm1 # 关闭后,查看master状态显示拒绝连接-08-09 10:20:56,565 INFO ipc.Client: Retrying connect to server: master/10.10.11.235:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)Operation failed: Call From master/10.10.11.235 to master:8033 failed on connection exception: .ConnectException: 拒绝连接; For more details see: /hadoop/ConnectionRefused[hadoop@master ~]$ yarn --daemon start resourcemanager # 重启服务再次观察状态[hadoop@master ~]$ yarn rmadmin -getServiceState rm1standby[hadoop@master ~]$ yarn rmadmin -getServiceState rm2active

注意,查看hadoop页面的ip,哪个节点是active状态,就写哪个ip,这里active状态已经切换,查看master节点的页面时已经无法访问,需要访问slave1节点的页面

测试运行wordcount

[hadoop@master ~]$ hadoop jar /home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.2.jar wordcount /fosafer/test/text.txt /fosafer/tmp # 等待转移完成-08-09 10:24:41,872 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2-08-09 10:24:42,126 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1691547639619_0001-08-09 10:24:42,456 INFO input.FileInputFormat: Total input files to process : 1-08-09 10:24:42,597 INFO mapreduce.JobSubmitter: number of splits:1-08-09 10:24:42,783 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1691547639619_0001-08-09 10:24:42,785 INFO mapreduce.JobSubmitter: Executing with tokens: []-08-09 10:24:42,957 INFO conf.Configuration: resource-types.xml not found-08-09 10:24:42,957 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.-08-09 10:24:43,450 INFO impl.YarnClientImpl: Submitted application application_1691547639619_0001-08-09 10:24:43,498 INFO mapreduce.Job: The url to track the job: http://slave1:8088/proxy/application_1691547639619_0001/-08-09 10:24:43,498 INFO mapreduce.Job: Running job: job_1691547639619_0001-08-09 10:24:50,616 INFO mapreduce.Job: Job job_1691547639619_0001 running in uber mode : false-08-09 10:24:50,617 INFO mapreduce.Job: map 0% reduce 0%-08-09 10:24:56,716 INFO mapreduce.Job: map 100% reduce 0%-08-09 10:25:01,762 INFO mapreduce.Job: map 100% reduce 100%-08-09 10:25:01,776 INFO mapreduce.Job: Job job_1691547639619_0001 completed successfully-08-09 10:25:01,916 INFO mapreduce.Job: Counters: 54File System CountersFILE: Number of bytes read=43FILE: Number of bytes written=560729FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=122HDFS: Number of bytes written=25HDFS: Number of read operations=8HDFS: Number of large read operations=0HDFS: Number of write operations=2HDFS: Number of bytes read erasure-coded=0Job Counters Launched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=3489Total time spent by all reduces in occupied slots (ms)=2680Total time spent by all map tasks (ms)=3489Total time spent by all reduce tasks (ms)=2680Total vcore-milliseconds taken by all map tasks=3489Total vcore-milliseconds taken by all reduce tasks=2680Total megabyte-milliseconds taken by all map tasks=3572736Total megabyte-milliseconds taken by all reduce tasks=2744320Map-Reduce FrameworkMap input records=3Map output records=3Map output bytes=31Map output materialized bytes=43Input split bytes=103Combine input records=3Combine output records=3Reduce input groups=3Reduce shuffle bytes=43Reduce input records=3Reduce output records=3Spilled Records=6Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=125CPU time spent (ms)=1680Physical memory (bytes) snapshot=703270912Virtual memory (bytes) snapshot=5685387264Total committed heap usage (bytes)=753401856Peak Map Physical memory (bytes)=388775936Peak Map Virtual memory (bytes)=2838519808Peak Reduce Physical memory (bytes)=314494976Peak Reduce Virtual memory (bytes)=2846867456Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=19File Output Format Counters Bytes Written=25[hadoop@master ~]$ hdfs dfs -cat /fosafer/tmp/* # 转移完成后查看123456 1asdfg 1qwert 1[hadoop@master ~]$

至此,hadoop集群部署完成!

12. Hadoop开启Histotryserver

Hadoop自带了一个历史服务,可以通过历史服务在web端查看已经运行完的Mapreduce作业记录

默认情况下,Hadoop历史服务是没有启动的,需要自行启动。

启动后,在下图中点击history可跳转至历史服务查看信息。

master

切换为hadoop用户,配置历史服务器

[hadoop@master ~]$ cd /home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop/[hadoop@master hadoop]$ vim mapred-site.xml #增加配置<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.application.classpath</name><value>/home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/common/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/common/lib/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/hdfs/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/lib/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/yarn/*,/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/*</value></property><property><name>mapreduce.jobhistory.address</name><value>slave1:10020</value></property><property><name>mapreduce.jobhistory.webapp.address</name><value>slave1:19888</value></property></configuration>

注释:

mapreduce.jobhistory.address:启动历史服务器的端口,只需配置一个节点

mapreduce.jobhistory.webapp.address:历史服务器web端的端口,只需配置一个节点

[hadoop@master hadoop]$ cat yarn-site.xml #增加配置<?xml version="1.0"?><!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.--><configuration><property><name>yarn.resourcemanager.connect.retry-interval.ms</name><value>2000</value></property><property><name>yarn.resourcemanager.ha.enabled</name><value>true</value></property><property><name>yarn.resourcemanager.ha.automatic-failover.enabled</name><value>true</value></property><property><name>yarn.resourcemanager.ha.automatic-failover.embedded</name><value>true</value></property><property><name>yarn.resourcemanager.cluster-id</name><value>yarn-rm-cluster</value></property><property><name>yarn.resourcemanager.ha.rm-ids</name><value>rm1,rm2</value></property><property><name>yarn.resourcemanager.hostname.rm1</name><value>master</value></property><property><name>yarn.resourcemanager.hostname.rm2</name><value>slave1</value></property><property><name>yarn.resourcemanager.recovery.enabled</name><value>true</value></property><property><description>The class to use as the persistent store.</description><name>yarn.resourcemanager.store.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value></property><property><name>yarn.resourcemanager.zk.state-store.address</name><value>master:2181,slave1:2181,slave2:2181</value></property><property><name>yarn.resourcemanager.zk-address</name><value>master:2181,slave1:2181,slave2:2181</value></property><property><name>yarn.resourcemanager.address.rm1</name><value>master:8032</value></property><property><name>yarn.resourcemanager.scheduler.address.rm1</name><value>master:8034</value></property><property><name>yarn.resourcemanager.webapp.address.rm1</name><value>master:8088</value></property><property><name>yarn.resourcemanager.address.rm2</name><value>slave1:8032</value></property><property><name>yarn.resourcemanager.scheduler.address.rm2</name><value>slave1:8034</value></property><property><name>yarn.resourcemanager.webapp.address.rm2</name><value>slave1:8088</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.log-aggregation-enable</name><value>true</value></property><property><name>yarn.log.server.url</name><value>http://slave1:19888/jobhistory/logs</value></property><property><name>yarn.log-aggregation.retain-seconds</name><value>604800</value></property></configuration>

注释:

yarn.log-aggregation-enable:是否开启日志聚集功能

yarn.log.server.url:日志聚集服务器地址

yarn.log-aggregation.retain-seconds:日志保留时间

# 启动历史服务,使用scp将配置文件分发到slave1和slave2节点[hadoop@master hadoop]$ scp mapred-site.xml hadoop@slave1:$PWD/ [hadoop@master hadoop]$ scp mapred-site.xml hadoop@slave2:$PWD/[hadoop@master hadoop]$ scp yarn-site.xml hadoop@slave1:$PWD/ [hadoop@master hadoop]$ scp yarn-site.xml hadoop@slave2:$PWD/yarn-site.xml [hadoop@master hadoop]$ cd[hadoop@master ~]$ source .bash_profile [hadoop@master ~]$ stop-all.sh WARNING: Stopping all Apache Hadoop daemons as hadoop in 10 seconds.WARNING: Use CTRL-C to abort.Stopping namenodes on [master slave1 slave2]Stopping datanodesStopping journal nodes [slave2 slave1 master]Stopping ZK Failover Controllers on NN hosts [master slave1 slave2]Stopping nodemanagersStopping resourcemanagers on [ master slave1][hadoop@master ~]$ start-all.sh WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.WARNING: This is not a recommended production deployment configuration.WARNING: Use CTRL-C to abort.Starting namenodes on [master slave1 slave2]Starting datanodesStarting journal nodes [slave2 slave1 master]Starting ZK Failover Controllers on NN hosts [master slave1 slave2]Starting resourcemanagers on [ master slave1]Starting nodemanagers

slave1

slave1上启动历史服务

[root@slave1 ~]# su hadoop[hadoop@slave1 root]$ cd[hadoop@slave1 ~]$ source .bash_profile [hadoop@slave1 ~]$ mapred --daemon start historyserver# 确认启动[hadoop@slave1 ~]$ jps 22834 JournalNode5 ResourceManager21256 QuorumPeerMain19608 NameNode23241 DFSZKFailoverController20780 Jps22285 DataNode20285 NodeManager20733 JobHistoryServer # 这个就是历史日志服务

启动后访问IP:19888,由于配置是在slave1上的,所以这里写slave1的ip

master

测试功能,确认可以记录

# 先删除之前的输出目录[hadoop@master ~]$ hdfs dfs -rm /fosafer/tmp/*-08-09 15:06:56,240 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit /sbnn-errorat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2094)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1550)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4040)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:1174)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:752)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1043)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:971)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976), while invoking ClientNamenodeProtocolTranslatorPB.getListing over slave1/10.10.11.236:9820 after 1 failover attempts. Trying to failover after sleeping for 1146ms.Deleted /fosafer/tmp/_SUCCESSDeleted /fosafer/tmp/part-r-00000[hadoop@master ~]$ hdfs dfs -rm -r /fosafer/tmp # 加 -r 删除目录-08-09 15:11:34,140 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit /sbnn-errorat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2094)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1550)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3342)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1208)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1042)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1043)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:971)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976), while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over slave1/10.10.11.236:9820 after 1 failover attempts. Trying to failover after sleeping for 579ms.Deleted /fosafer/tmp# 然后利用之前创建的数据目录里的text.txt文件,再次输出[hadoop@master ~]$ hadoop jar /home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.2.jar wordcount /fosafer/test/text.txt /fosafer/tmp-08-09 15:12:30,739 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit /sbnn-errorat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2094)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1550)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3342)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1208)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1042)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1043)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:971)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976), while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over slave1/10.10.11.236:9820 after 1 failover attempts. Trying to failover after sleeping for 565ms.-08-09 15:12:31,397 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2-08-09 15:12:31,683 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1691564369682_0001-08-09 15:12:32,443 INFO input.FileInputFormat: Total input files to process : 1-08-09 15:12:32,573 INFO mapreduce.JobSubmitter: number of splits:1-08-09 15:12:32,726 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1691564369682_0001-08-09 15:12:32,728 INFO mapreduce.JobSubmitter: Executing with tokens: []-08-09 15:12:32,777 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit /sbnn-errorat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2094)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1550)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3342)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1208)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1042)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1043)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:971)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976), while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over slave1/10.10.11.236:9820 after 1 failover attempts. Trying to failover after sleeping for 1355ms.-08-09 15:12:34,244 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit /sbnn-errorat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2094)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1550)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3342)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1208)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1042)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1043)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:971)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976), while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over slave1/10.10.11.236:9820 after 1 failover attempts. Trying to failover after sleeping for 774ms.-08-09 15:12:35,089 INFO conf.Configuration: resource-types.xml not found-08-09 15:12:35,089 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.-08-09 15:12:35,392 INFO impl.YarnClientImpl: Submitted application application_1691564369682_0001-08-09 15:12:35,441 INFO mapreduce.Job: The url to track the job: http://slave1:8088/proxy/application_1691564369682_0001/-08-09 15:12:35,442 INFO mapreduce.Job: Running job: job_1691564369682_0001 # 这个名称就是历史日志的名称-08-09 15:12:47,616 INFO mapreduce.Job: Job job_1691564369682_0001 running in uber mode : false-08-09 15:12:47,618 INFO mapreduce.Job: map 0% reduce 0%-08-09 15:12:57,730 INFO mapreduce.Job: map 100% reduce 0%-08-09 15:13:04,776 INFO mapreduce.Job: map 100% reduce 100%-08-09 15:13:04,788 INFO mapreduce.Job: Job job_1691564369682_0001 completed successfully-08-09 15:13:04,940 INFO mapreduce.Job: Counters: 54File System CountersFILE: Number of bytes read=43FILE: Number of bytes written=561063FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=122HDFS: Number of bytes written=25HDFS: Number of read operations=8HDFS: Number of large read operations=0HDFS: Number of write operations=2HDFS: Number of bytes read erasure-coded=0Job Counters Launched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=7318Total time spent by all reduces in occupied slots (ms)=4832Total time spent by all map tasks (ms)=7318Total time spent by all reduce tasks (ms)=4832Total vcore-milliseconds taken by all map tasks=7318Total vcore-milliseconds taken by all reduce tasks=4832Total megabyte-milliseconds taken by all map tasks=7493632Total megabyte-milliseconds taken by all reduce tasks=4947968Map-Reduce FrameworkMap input records=3Map output records=3Map output bytes=31Map output materialized bytes=43Input split bytes=103Combine input records=3Combine output records=3Reduce input groups=3Reduce shuffle bytes=43Reduce input records=3Reduce output records=3Spilled Records=6Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=199CPU time spent (ms)=2390Physical memory (bytes) snapshot=927694848Virtual memory (bytes) snapshot=5690384384Total committed heap usage (bytes)=1214775296Peak Map Physical memory (bytes)=604606464Peak Map Virtual memory (bytes)=2844090368Peak Reduce Physical memory (bytes)=323088384Peak Reduce Virtual memory (bytes)=2846294016Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=19File Output Format Counters Bytes Written=25

再次查看历史服务页面,发现多了一个日志,名称与刚才执行输出的日志名称相同

点击日志名称

点击logs,可以查看日志

四、问题解决

1. Operation category READ is not supported in state standby

如果出现Operation category READ is not supported in state standby错误

#修改nn2和nn3为Standby,nn1为active$ hdfs haadmin -transitionToStandby --forcemanual nn2$ hdfs haadmin -transitionToStandby --forcemanual nn3$ hdfs haadmin -transitionToActive --forcemanual nn1#查看nn1状态$ hdfs haadmin -getServiceState nn1 active # 为active即可

2.找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster

如果出现找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster

#修改yarn-site.xml文件$ cd /home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop$ vim yarn-site.xml # 末尾添加,修改后记得传输slave1与slave2<property><name>yarn.application.classpath</name><value>/home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/common/lib/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/common/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/hdfs:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/hdfs/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/yarn:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/yarn/*</value></property># 注意:上面属性yarn.application.classpath的value值通过如下命令获取$ hadoop classpath/home/fosafer/hadoop/hadoop-3.3.2/etc/hadoop:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/common/lib/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/common/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/hdfs:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/hdfs/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/yarn:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/*:/home/fosafer/hadoop/hadoop-3.3.2/share/hadoop/yarn/*

3. 查看IP:9870时,测试高可用发现都是Standby不会自动切换

集群模式下查看IP:9870页面时,如果出现都是Standby,或者测试高可用时关闭active节点后,发现不会自动切换,一直都是Standby状态

# 安装fuster程序的软件包Psmisc,开头已经安装过了yum -y install psmisc

五、可选配置

yarn-site.xml

<property><name>yarn.nodemanager.resource.cpu-vcores</name><value>-1</value></property><property><name>yarn.nodemanager.resource.memory-mb</name><value>-1</value></property><property><name>yarn.nodemanager.resource.detect-hardware-capabilities</name><value>true</value></property>

yarn.nodemanager.resource.cpu-vcores

用途:指定每个NodeManager上可用的虚拟CPU核心数(vCores)。值:正整数或-1。如果设置为-1,表示NodeManager将使用主机上的所有可用虚拟CPU核心数。

yarn.nodemanager.resource.memory-mb

用途:指定每个NodeManager上可用的内存量(以MB为单位)。值:正整数或-1。如果设置为-1,表示NodeManager将使用主机上的所有可用内存量。

yarn.nodemanager.resource.detect-hardware-capabilities

用途:指定是否自动检测NodeManager主机的硬件资源能力,以便更好地配置资源。值:truefalse。如果设置为true,则NodeManager将尝试自动检测硬件资源能力,并根据检测结果设置虚拟CPU核心数和内存。

mapred-site.xml

<property><name>yarn.app.mapreduce.am.env</name><value>HADOOP_MAPRED_HOME=${HADOOP_CLASSPATH}</value></property><property><name>mapreduce.map.env</name><value>HADOOP_MAPRED_HOME=${HADOOP_CLASSPATH}</value></property><property><name>mapreduce.reduce.env</name><value>HADOOP_MAPRED_HOME=${HADOOP_CLASSPATH}</value></property><property><name>mapreduce.admin.user.env</name><value></value></property>

yarn.app.mapreduce.am.env

用途:配置MapReduce应用管理器(Application Master)的环境变量。值:HADOOP_MAPRED_HOME=${HADOOP_CLASSPATH}表示将HADOOP_MAPRED_HOME设置为${HADOOP_CLASSPATH}${HADOOP_CLASSPATH}是Hadoop类路径的一个特殊变量,具体是hadoop classpath命令执行的结果

mapreduce.map.envmapreduce.reduce.env

用途:分别配置Map任务和Reduce任务的环境变量。值:HADOOP_MAPRED_HOME=${HADOOP_CLASSPATH}表示将HADOOP_MAPRED_HOME设置为${HADOOP_CLASSPATH},具体是hadoop classpath命令执行的结果

mapreduce.admin.user.env

用途:配置MapReduce作业的管理员用户的环境变量。值:一系列以冒号分隔的环境变量路径,包含了许多库和第三方依赖项的路径,具体根据使用场景进行修改

六、常用命令

启动和停止命令

启动Hadoop集群:start-all.shstart-dfs.shstart-yarn.sh

停止Hadoop集群:stop-all.shstop-dfs.shstop-yarn.sh

HDFS操作

创建HDFS目录:hdfs dfs -mkdir /path/to/directory

查看HDFS目录内容:hdfs dfs -ls /path/to/directory

上传文件到HDFS:hdfs dfs -put localfile hdfsfile

下载文件从HDFS:hdfs dfs -get hdfsfile localfile

删除HDFS文件或目录:hdfs dfs -rm -r /path/to/file_or_directory

YARN操作

查看YARN集群节点状态:yarn node -list

查看正在运行的YARN应用:yarn application -list

查看指定应用的状态:yarn application -status <application_id>

杀死指定应用:yarn application -kill <application_id>

MapReduce操作

运行MapReduce作业:hadoop jar <jar_path> <main_class> <input_path> <output_path>

查看作业状态:mapred job -status <job_id>

杀死作业:mapred job -kill <job_id>

配置和信息查询

查看Hadoop配置信息:hadoop com.sun.tools.javac.Main <config_file>

查看HDFS状态信息:hdfs dfsadmin -report

查看HDFS容量信息:hdfs dfs -df -h

其他常用命令

格式化HDFS:hadoop namenode -format

启动和停止Hadoop守护进程:hadoop-daemon.sh start|stop <daemon_name>

查看日志:hadoop logs -applicationId <application_id>

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。