搭建hadoop分布式环境
安装环境
- CentOS 8
- JDK:1.8_231
- hadoop:3.2.1
主机准备
主机名:hadoop
vi /etc/hostname
主机名与IP地址映射:
1
2vi /etc/hosts
IP地址 hadoop防火墙关闭
1
systemctl status firewalld
创建hadoop用户
1
2useradd hadoop
passwd hadoophadoop用户环境变量,配置JDK; /home/hadoop/.bash_profile
安装Hadoop
配置hadoop用户免密登录(ssh)
- ```bash
cd ~
ssh-keygen -t rsa
cd .ssh
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys
chmod 700 ~/.ssh/1
2
3
4
5
6
2. 验证免密
```bash
ssh hadoop
exit
- ```bash
上传文件
解压,移动到
/usr/
下1
2
3
4tar zxvf hadoop-3.2.1.tar.gz
su
mv /home/hadoop/tools/hadoop-3.2.1 /usr/
su - hadoophadoop环境变量
~/.bash_profile
1
2
3
4
5
6
7
8
9
10
11$ vi ~/.bash_profile
JAVA_HOME=/usr/jdk1.8.0_231
HADOOP_HOME=/usr/hadoop-3.2.1
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH
export JAVA_HOME
export HADOOP_HOME
export PATH
$ source .bash_profilehadoop的基本配置文件,hadoop-env.sh
1
2
3
4
5$ cd /usr/hadoop-3.2.1/etc/hadoop
$ vi hadoop-env.sh
54 export JAVA_HOME=/usr/jdk1.8.0_231测试基本配置是否完成
1
2
3
4
5
6
7hadoop version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /usr/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar准备目录
/usr/local/hadoop
,用于job执行临时目录,和数据存储1
2cd /usr/local/
chown hadoop:hadoop hadoop核心配置
core-site.xml
$HADOOP_HOME/etc/hadoop/core-site.xml
1
2$ vi /usr/hadoop-3.2.1/etc/hadoop/core-site.xml
fs.defaultFS hdfs://hadoop:9000 hadoop.tmp.dir /usr/local/hadoop/tmp 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
262. hdfs-site.xml `$HADOOP_HOME/etc/hadoop`
```xml
<!-- 文件存储在hdfs上的副本数量-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- hdfs web监听端口-->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop:9870</value>
</property>
<!-- namenode数据存储路径 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
</property>
<!-- datanode数据存储路径 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/dfs/data</value>
</property>mapred-site.xml
1
2
3
4
5
6
7
8
9
10
11<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/usr/hadoop-3.2.1/etc/hadoop:/usr/hadoop-3.2.1/share/hadoop/common/lib/*:/usr/hadoop-3.2.1/share/hadoop/common/*:/usr/hadoop-3.2.1/share/hadoop/hdfs:/usr/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/usr/hadoop-3.2.1/share/hadoop/hdfs/*:/usr/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/usr/hadoop-3.2.1/share/hadoop/mapreduce/*:/usr/hadoop-3.2.1/share/hadoop/yarn:/usr/hadoop-3.2.1/share/hadoop/yarn/lib/*:/usr/hadoop-3.2.1/share/hadoop/yarn/*
</value>
</property>yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/usr/hadoop-3.2.1/etc/hadoop:/usr/hadoop-3.2.1/share/hadoop/common/lib/*:/usr/hadoop-3.2.1/share/hadoop/common/*:/usr/hadoop-3.2.1/share/hadoop/hdfs:/usr/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/usr/hadoop-3.2.1/share/hadoop/hdfs/*:/usr/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/usr/hadoop-3.2.1/share/hadoop/mapreduce/*:/usr/hadoop-3.2.1/share/hadoop/yarn:/usr/hadoop-3.2.1/share/hadoop/yarn/lib/*:/usr/hadoop-3.2.1/share/hadoop/yarn/*
</value>
</property>workers
1
hadoop
启动Hadoop集群
首次启动hadoop,必须namenode格式化,命令如下:
1
2
3$ hdfs namenode -format
2020-07-07 14:25:59,234 INFO common.Storage: Storage directory /usr/local/hadoop/dfs/name has been successfully formatted.启动两个服务(HDFS、yarn),命令如下:
1
2
3
4
5
6
7
8
9
10
11
12
13$ start-dfs.sh
$ start-yarn.sh
$ jps
14337 Jps
11608 jar
13610 SecondaryNameNode
13867 ResourceManager
13388 DataNode
13261 NameNode
13997 NodeManagerHDFS管理命令
hdfs安全模式查看
1
2
3
4
5$ hdfs dfsadmin -safemode get
Safe mode is OFF
# 注意:Safe mode is OFF,说明HDFS安全模式已经关闭,实现对数据的读写操作查看根目录结构
1
$ hdfs dfs -ls /
创建目录
1
2
3$ hdfs dfs -mkdir /data
$ hdfs dfs -ls /递归创建目录
1
2
3$ hdfs dfs -mkdir -p /data/subdata/input
$ hdfs dfs -ls -R /上传本地文件到htfs目录
1
2
3$ hdfs dfs -put jdk-8u231-linux-x64.tar.gz /data/
$ hdfs dfs -ls /data下载hdfs数据文件到本地操作系统
1
2
3$ hdfs dfs -get /data/jdk-8u231-linux-x64.tar.gz ./
$ ll复制文件
1
$ hdfs dfs -cp /data/jdk-8u231-linux-x64.tar.gz /data/subdata/jdk.tar.gz
删除文件
1
$ hdfs dfs -rm -r /data/subdata/
HDFS管理命令
安全模式
1
2
3$ hdfs dfsadmin -safemode get
# hdfs dfsadmin -safemode get|enter|leave|waitreport命令
1
$ hdfs dfsadmin -report
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40Configured Capacity: 37558423552 (34.98 GB)
Present Capacity: 29635686400 (27.60 GB)
DFS Remaining: 29440000000 (27.42 GB)
DFS Used: 195686400 (186.62 MB)
DFS Used%: 0.66%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (1):
Name: 192.168.1.103:9866 (hadoop)
Hostname: hadoop
Decommission Status : Normal
Configured Capacity: 37558423552 (34.98 GB)
DFS Used: 195686400 (186.62 MB)
Non DFS Used: 7922737152 (7.38 GB)
DFS Remaining: 29440000000 (27.42 GB)
DFS Used%: 0.52%
DFS Remaining%: 78.38%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Jul 07 14:54:09 CST 2020
Last Block Report: Tue Jul 07 14:28:45 CST 2020
Num of Blocks: 2
HDFS回收站
默认关闭
启动回收站
core-site.xml
1
搭建hadoop分布式环境