클라이언트 서버 연결하기

페이지 이동경로

클라이언트 서버 연결하기

클라이언트 서버에서 Hadoop Eco 클러스터에 연결하는 방법은 다음과 같습니다.

사전 작업

파일 다운로드하기

클러스터 상세 페이지에서 설정 파일 다운로드 경로를 확인하여 파일을 다운로드합니다.

코드 예제 설정 파일 다운로드

#################################
# 다운로드 경로
# - 생성년월: 클러스터 정보의 생성일
# - 클러스터ID: 클러스터 정보의 클러스터 ID
wget https://objectstorage.kr-central-1.kakaoi.io/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-files/[생성 년월]/[클러스터 ID]/conf.tgz
 
# - 참고 예시 경로
wget https://objectstorage.kr-central-1.kakaoi.io/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-files/202211/3cc2d198-253a-44e9-9d1b-97565cb68829/conf.tgz

 
########################
# 압축 해제
tar zxf conf.tgz

Host 정보 추가하기

클러스터와 연결하기 위해 Host 정보를 추가합니다.

코드 예제 Host 정보 추가하기

$ sudo vi /etc/hosts
 
$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain  localhost
10.182.49.211 bigdata-hadoop-job.kep.k9d.in bigdata-hadoop-job
10.182.49.55    hadoopmst-hadoop-single-1
10.182.48.83    hadoopwrk-hadoop-single-1
10.182.49.70    hadoopwrk-hadoop-single-2
10.182.49.84    hadoopwrk-hadoop-single-3

Hadoop

파일 다운로드하기

연결하기 전에 필요한 JDK, Hadoop 바이너리 파일을 다운로드합니다.

코드 예제 Hadoop - 파일 다운로드

########################
# 파일 다운로드
# Java, Hadoop
wget https://objectstorage.kr-central-1.kakaoi.io/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-files/hde-1.0.0/OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
wget https://objectstorage.kr-central-1.kakaoi.io/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-files/hde-1.0.0/hadoop-2.10.0.tar.gz
 
########################
# 압축 해제
tar zxf hadoop-2.10.0.tar.gz
tar zxf OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
 
########################
# 파일 확인
$ ll
total 483728
drwxr-xr-x 4 deploy deploy      4096 Mar 30 15:50 ./
drwxr-x--- 6 deploy deploy      4096 Mar 30 15:46 ../
drwxr-xr-x 9 deploy deploy      4096 Oct 23  2019 hadoop-2.10.0/
-rw-r--r-- 1 deploy deploy 392115733 Mar 30 15:45 hadoop-2.10.0.tar.gz
drwxr-xr-x 8 deploy deploy      4096 Jul 15  2020 jdk8u262-b10/
-rw-r--r-- 1 deploy deploy 103200089 Mar 30 15:46 OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz

Hadoop 설정 연결하기

Hadoop의 설정을 연결할 경우, 다운로드한 설정을 유지한 채로 이용하는 방법과 주요 설정 파일을 이용하는 방법이 있습니다.

다운로드한 설정을 유지한 채로 이용

다운로드한 설정을 유지한 채 다음 설정을 추가합니다.

코드 예제 다운로드한 설정 유지한 채로 이용

$ export JAVA_HOME=/home/deploy/client/jdk8u262-b10
$ export HADOOP_CONF_DIR=/home/deploy/client/conf/etc/hadoop/conf/
$ ./bin/hadoop fs -ls hdfs:///
Found 12 items
drwxrwxrwt   - yarn   hadoop          0 2022-03-30 13:10 hdfs:///app-logs
drwxrwxr-t   - hdfs   hadoop          0 2022-03-30 14:12 hdfs:///apps
drwxr-xr-t   - yarn   hadoop          0 2022-03-30 13:06 hdfs:///ats

주요 설정 파일을 이용

주요 설정 파일인 core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml 파일을 복사해 이용합니다.

코드 예제 주요 설정 파일을 이용

$ cp /home/deploy/client/conf/etc/hadoop/conf/core-site.xml /home/deploy/client/hadoop-2.10.0/etc/hadoop/
$ cp /home/deploy/client/conf/etc/hadoop/conf/hdfs-site.xml /home/deploy/client/hadoop-2.10.0/etc/hadoop/
$ cp /home/deploy/client/conf/etc/hadoop/conf/yarn-site.xml /home/deploy/client/hadoop-2.10.0/etc/hadoop/
$ cp /home/deploy/client/conf/etc/hadoop/conf/mapred-site.xml /home/deploy/client/hadoop-2.10.0/etc/hadoop/
 
$ export JAVA_HOME=/home/deploy/client/jdk8u262-b10
 
$ ./bin/hadoop fs -ls hdfs:///
Found 12 items
drwxrwxrwt   - yarn   hadoop          0 2022-03-30 13:10 hdfs:///app-logs
drwxrwxr-t   - hdfs   hadoop          0 2022-03-30 14:12 hdfs:///apps
drwxr-xr-t   - yarn   hadoop          0 2022-03-30 13:06 hdfs:///ats

Hive

파일 다운로드

연결하기 전에 필요한 JDK, Apache Hive 바이너리 파일을 다운로드합니다.

코드 예제 Hive - 파일 다운로드

########################
# 파일 다운로드
# Java, Hadoop
wget https://objectstorage.kr-central-1.kakaoi.io/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-files/hde-1.0.0/OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
wget https://objectstorage.kr-central-1.kakaoi.io/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-files/hde-1.0.0/apache-hive-2.3.2-bin.tar.gz
 
########################
# 압축 해제
tar zxf apache-hive-2.3.2-bin.tar.gz
tar zxf OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
 
########################
# 파일 확인
$ ll
total 710048
drwxr-xr-x  6 deploy deploy      4096 Mar 30 17:12 ./
drwxr-x---  7 deploy deploy      4096 Mar 30 17:15 ../
drwxr-xr-x 10 deploy deploy      4096 Mar 30 17:12 apache-hive-2.3.2-bin/
-rw-r--r--  1 deploy deploy 231740978 Mar 30 15:44 apache-hive-2.3.2-bin.tar.gz
drwxr-xr-x  8 deploy deploy      4096 Jul 15  2020 jdk8u262-b10/
-rw-r--r--  1 deploy deploy 103200089 Mar 30 15:46 OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz

Hive 설정 연결하기

Hive 설정을 연결하기 위해서 먼저 Hadoop을 설정해야 합니다. Hadoop을 설정한 후, 필요한 설정을 연결하기 전에 export 하고, 서버의 설정 파일에서 Hive 설정을 연결하여 작업을 진행합니다.

코드 예제 Hive 설정 연결

export JAVA_HOME=/home/deploy/client/jdk8u262-b10
export HADOOP_HOME=/home/deploy/client/hadoop-2.10.0/
export HIVE_CONF_DIR=/home/deploy/client/conf/etc/hive/conf
 
$ ./bin/beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/deploy/client/apache-hive-2.3.2-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/deploy/client/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.2 by Apache Hive
beeline> !connect jdbc:hive2://hadoopmst-hadoop-single-1:10000/default;
Connecting to jdbc:hive2://hadoopmst-hadoop-single-1:10000/default;
Enter username for jdbc:hive2://hadoopmst-hadoop-single-1:10000/default:
Enter password for jdbc:hive2://hadoopmst-hadoop-single-1:10000/default:
Connected to: Apache Hive (version 2.3.2)
Driver: Hive JDBC (version 2.3.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoopmst-hadoop-single-1:100> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (1.977 seconds)

Spark

연결하기 전 파일 다운로드

연결하기 전에 필요한 JDK, Apache Spark 바이너리 파일을 다운로드합니다.

코드 예제 Spark - 파일 다운로드

########################
# 파일 다운로드
# Java, Hadoop
wget https://objectstorage.kr-central-1.kakaoi.io/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-files/hde-1.0.0/OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
wget https://objectstorage.kr-central-1.kakaoi.io/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-files/hde-1.0.0/spark-2.4.6-bin-without-hadoop.tgz
 
 
########################
# 압축 해제
tar zxf spark-2.4.6-bin-without-hadoop.tgz
tar zxf OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
 
########################
# 파일 확인
$ ll
total 874336
drwxr-xr-x  7 deploy deploy      4096 Mar 30 17:39 ./
drwxr-x---  7 deploy deploy      4096 Mar 30 19:08 ../
drwxr-xr-x  9 deploy deploy      4096 Oct 23  2019 hadoop-2.10.0/
-rw-r--r--  1 deploy deploy 392115733 Mar 30 15:45 hadoop-2.10.0.tar.gz
drwxr-xr-x 13 deploy deploy      4096 Mar 30 19:02 spark-2.4.6-bin-without-hadoop/
-rw-r--r--  1 deploy deploy 168225415 Mar 30 15:43 spark-2.4.6-bin-without-hadoop.tgz

Spark 설정 연결하기

Spark를 사용하기 위해서 압축을 해제한 spark 폴더에서 spark-env.sh 파일을 생성하고 기본 설정을 추가합니다.
아래의 설정에서 파일의 위치를 정확하게 지정합니다.

코드 예제 Spark 설정 연결

$ cat conf/spark-env.sh
#!/usr/bin/env bash
 
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
 
# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
 
# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
 
# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos
 
# Options read in YARN client/cluster mode
# - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN
# - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)
 
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
 
# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR      Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR       Where log files are stored.  (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR       Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING  A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS      The scheduling priority for daemons. (Default: 0)
# - SPARK_NO_DAEMONIZE  Run the proposed command in the foreground. It will not output a PID file.
# Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
# You might get better performance to enable these options if using native BLAS (see SPARK-21305).
# - MKL_NUM_THREADS=1        Disable multi-threading of Intel MKL
# - OPENBLAS_NUM_THREADS=1   Disable multi-threading of OpenBLAS
export TERM=xterm-color
export JAVA_HOME=${JAVA_HOME:-/usr/lib/jdk}
 
export SPARK_HOME=${SPARK_HOME:-/opt/spark}
export SPARK_LOG_DIR=${SPARK_LOG_DIR:-/var/log/spark}
export SPARK_PID_DIR=${SPARK_PID_DIR:-/hadoop/pid}
 
export HADOOP_HOME=${HADOOP_HOME:-/opt/hadoop}
export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-/opt/hadoop}
export HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-/opt/hadoop}
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-/opt/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}
export YARN_CONF_DIR=${YARN_CONF_DIR:-/etc/hadoop/conf}
export SPARK_CONF_DIR=${SPARK_CONF_DIR:-/etc/spark/conf}
 
# Let's run everything with JVM runtime, instead of Scala
export SPARK_LAUNCH_WITH_SCALA=0
export SPARK_LIBRARY_PATH=${SPARK_LIBRARY_PATH:-${SPARK_HOME}/lib}
export SCALA_LIBRARY_PATH=${SCALA_LIBRARY_PATH:-${SPARK_HOME}/lib}
 
export SPARK_DIST_CLASSPATH=$(/home/deploy/client/hadoop-2.10.0/bin/hadoop classpath)
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:${HADOOP_HOME}/lib/native
 
export STANDALONE_SPARK_MASTER_HOST=`hostname -f`
#export SPARK_MASTER_HOST=`hostname -f`
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=18080
 
export SPARK_WORKER_DIR=${SPARK_WORKER_DIR:-/var/run/spark/work}
export SPARK_WORKER_PORT=7078
export SPARK_WORKER_WEBUI_PORT=18081
export SPARK_MASTER_URL=spark://$STANDALONE_SPARK_MASTER_HOST:$SPARK_MASTER_PORT
export SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=hdfs:///var/log/spark/apps -Dspark.history.ui.port=18082"
 
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/hadoop/lib/native
export SPARK_YARN_USER_ENV="JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH,LD_LIBRARY_PATH=$LD_LIBRARY_PATH"

Spark 실행하기

Spark 작업을 실행하기 위해 설정한 spark-env.sh파일이 정확하게 설정되었다면 별도의 설정은 필요하지 않습니다.
참고로 Spark 설정 연결하기에서 추가한 기본 설정을 이용해서 작업을 진행한 경우는 다음과 같습니다.

코드 예제 Spark 실행

export JAVA_HOME=/home/deploy/client/jdk8u262-b10
export SPARK_HOME=/home/deploy/client/spark-2.4.6-bin-without-hadoop
export HADOOP_CONF_DIR=/home/deploy/client/conf/etc/hadoop/conf
export HADOOP_HOME=/home/deploy/client/hadoop-2.10.0
 
export HADOOP_HOME=/home/deploy/client/hadoop-2.10.0
export HADOOP_HDFS_HOME=/home/deploy/client/hadoop-2.10.0
export HADOOP_MAPRED_HOME=/home/deploy/client/hadoop-2.10.0
export HADOOP_YARN_HOME=/home/deploy/client/hadoop-2.10.0
 
export HADOOP_CONF_DIR=/home/deploy/client/conf/etc/hadoop/conf
export YARN_CONF_DIR=/home/deploy/client/conf/etc/hadoop/conf
 
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi --master yarn \
./examples/jars/spark-examples_2.11-2.4.6.jar 100

HBase

파일 다운로드하기

연결하기 전에 필요한 JDK, Hadoop, HBase 바이너리 파일을 다운로드합니다.

코드 예제 HBase - 파일 다운로드

########################
# 파일 다운로드
# Java, Hadoop
wget https://objectstorage.kr-central-1.kakaoi.io/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-files/hde-1.0.0/OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
wget https://objectstorage.kr-central-1.kakaoi.io/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-files/hde-1.0.0/hadoop-2.10.0.tar.gz
wget https://objectstorage.kr-central-1.kakaoi.io/v1/c745e6650f0341a68bb73fa222e88e9b/kbp-files/hde-1.0.0/hbase-1.4.13-bin.tar.gz
 
########################
# 압축 해제
tar zxf hadoop-2.10.0.tar.gz
tar zxf hbase-1.4.13-bin.tar.gz
tar zxf OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz
 
########################
# 파일 확인  $ ll
total 599372
drwxr-xr-x 6 deploy deploy      4096 Mar 30 19:31 ./
drwxr-x--- 8 deploy deploy      4096 Mar 30 19:43 ../
-rw-r--r-- 1 deploy deploy     27517 Mar 30 19:14 conf.tgz
drwxr-xr-x 4 deploy deploy      4096 Mar 30 19:31 etc/
drwxr-xr-x 9 deploy deploy      4096 Oct 23  2019 hadoop-2.10.0/
-rw-r--r-- 1 deploy deploy 392115733 Mar 30 15:45 hadoop-2.10.0.tar.gz
-rw-r--r-- 1 deploy deploy     21967 Mar 30 19:14 hadoop-conf.tgz
drwxr-xr-x 7 deploy deploy      4096 Mar 30 19:29 hbase-1.4.13/
-rw-r--r-- 1 deploy deploy 118343766 Mar 30 15:42 hbase-1.4.13-bin.tar.gz
-rw-r--r-- 1 deploy deploy      5417 Mar 30 19:14 hbase-conf.tgz
drwxr-xr-x 8 deploy deploy      4096 Jul 15  2020 jdk8u262-b10/
-rw-r--r-- 1 deploy deploy 103200089 Mar 30 15:46 OpenJDK8U-jdk_x64_linux_hotspot_8u262b10.tar.gz

HBase 설정 연결하기

Hbase 설정을 연결하기 위해 Hadoop도 함께 설정합니다.

코드 예제 HBase 설정 연결

export JAVA_HOME=/home/deploy/hbase_client/jdk8u262-b10
export HADOOP_CONF_DIR=/home/deploy/hbase_client/etc/hadoop/conf/
export HBASE_CONF_DIR=/home/deploy/hbase_client/etc/hbase/conf/
 
$ ./hbase-1.4.13/bin/hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.13, r38bf65a22b7e9320f07aeb27677e4533b9a77ef4, Sun Feb 23 02:06:36 PST 2020
 
hbase(main):001:0> status
1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667 average load
 
hbase(main):002:0> exit