[hadoop] hadoop_1


Hadoop


๋ถˆ๊ณผ 10๋…„ ์ „๋งŒ ํ•ด๋„ DB๊ฐ€ ๋‹ค๋ฃจ๋Š” ๋ฐ์ดํ„ฐ ์–‘์ด ์ˆ˜์ฒœ๋งŒ๊ฑด์— ๋‹ฌํ•˜๋ฉด ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ค„ ์•ˆ๋‹ค๊ณ  ์ž๋ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

2~3๋…„ ์‚ฌ์ด DB๊ฐ€ ๊ฐ๋‹นํ•ด์•ผ ํ•  ์ •๋ณด์˜ ์–‘์€ ๊ธ‰์†๋„๋กœ ์ฆ๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ˆ˜์–ต ๊ฐœ์— ๋‹ฌํ•˜๋Š” ๋น…๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ค„ ์•Œ์•„์•ผํ•ฉ๋‹ˆ๋‹ค.

ํ…์ŠคํŠธ ์œ„์ฃผ์˜ ์ •๋ณด๊ฐ€ ๊ณผ๊ฑฐ์— ๋งŽ์•˜๋˜ ๋ฐ˜๋ฉด ์ด์ œ๋Š” ๊ทธ๋ฆผ๊ณผ ๋™์˜์ƒ ์œ„์ฃผ์˜ ์ •๋ณด๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ํ…์ŠคํŠธ ์ค‘์‹ฌ์˜ ์ •ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ–ˆ๋˜ ๊ด€๊ณ„ํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋กœ๋Š” ๊ทธ๋ฆผ๊ณผ ๋™์˜์ƒ ์œ„์ฃผ์˜ ๋น„์ •ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ๋‹นํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

์ฒ˜์Œ์—๋Š” ์Šคํ† ๋ฆฌ์ง€๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋Š˜์–ด๋‚˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ๋‹นํ–ˆ์Šต๋‹ˆ๋‹ค.

DAS, NAS, SAN ๋“ฑ์˜ ์Šคํ† ๋ฆฌ์ง€๋ฅผ ์ถ”๊ฐ€ํ•ด๊ฐ€๋ฉฐ ๋น…๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ๋‹นํ–ˆ๊ณ , ์ด๋ฅผ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๊ฐ€ ์ฒ˜๋ฆฌํ•˜์˜€์œผ๋‚˜ ํ•œ๊ณ„์— ๋ถ€๋”ชํ˜”์Šต๋‹ˆ๋‹ค. ์Šคํ† ๋ฆฌ์ง€๋งŒ ๋Š˜๋ฆฐ๋‹ค๊ณ  ํ•ด์„œ ๋น…๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ์—ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค

๋Œ€์šฉ๋Ÿ‰ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ(MPP; Massively Parallel Processing) ๋ฐฉ๋ฒ•์ด ๋“ฑ์žฅํ–ˆ๊ณ  ํ”„๋กœ๊ทธ๋žจ์„ ์—ฌ๋Ÿฌ ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋ˆ  ์—ฌ๋Ÿฌ ํ”„๋กœ์„ธ์Šค๊ฐ€ ๊ฐ ๋ถ€๋ถ„์„ ๋™์‹œ์— ์ˆ˜ํ–‰์‹œํ‚ฌ ์ˆ˜ ์žˆ๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค.ํ•˜๋‚˜์˜ ํ”„๋กœ๊ทธ๋žจ์„ ์ˆ˜ํ–‰ํ•˜๋Š”๋ฐ ์ˆ˜๋ฐฑ ๋˜๋Š” ์ˆ˜์ฒœ๊ฐœ์˜ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ด์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค.๊ทธ๋Ÿฌ๋‚˜ ์„ฑ๋Šฅ ๊ฐœ์„ ๋„ ์ •ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ๋•Œ๋Š” ํšจ์œจ์ ์ด์—ˆ์ง€๋งŒ ํญ์ฆํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ๋Š” ๋น„์šฉ์ด ํšจ์œจ์ ์ด์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.๊ณ ๊ฐ์ด ๊ฐ๋‹นํ•˜๊ธฐ์— ํˆฌ์žํ•˜๋Š” ์ž๊ธˆ์ด ๋„ˆ๋ฌด๋‚˜ ๋น„์ŒŒ๋˜ ๊ฒƒ์ž…๋‹ˆ๋‹ค


์ด๋•Œ ํ˜œ์„ฑ์ฒ˜๋Ÿผ ๋“ฑ์žฅํ•œ ๊ฒƒ์ด ๋ฐ”๋กœ ํ•˜๋‘ก์ž…๋‹ˆ๋‹ค.


ํ•˜๋‘ก(hadoop)์€ ๋Œ€์šฉ๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ๊ฐœ๋ฐœ๋œ ์˜คํ”ˆ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์ž…๋‹ˆ๋‹ค.

(open-source software)

๊ตฌ๊ธ€์˜ ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ ๊ธฐ๋Šฅ์€ ํ•˜๋‘ก ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ(HDFS, Hadoop Distributed File System), ๊ตฌ๊ธ€์˜ ๋งต๋ฆฌ๋“€์Šค๋Š” ํ•˜๋‘ก ๋งต๋ฆฌ๋“€์Šค(Hadoop MapReduce), ๊ตฌ๊ธ€์˜ ๋น…ํ…Œ์ด๋ธ”์€ Hbase๊ฐ€ ๊ฐ๊ฐ ๋‹ด๋‹นํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•˜๋‘ก์€ ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ์ธ ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ๊ณผ ๋งต๋ฆฌ๋“€์Šค ์ด์™ธ์— ๋‹ค์–‘ํ•œ ๊ธฐ๋Šฅ์„ ๋‹ด๋‹นํ•˜๋Š” ์‹œ์Šคํ…œ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜๋‘ก ํ”„๋กœ๊ทธ๋žจ์„ ์‰ฝ๊ฒŒ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ์†”๋ฃจ์…˜์œผ๋กœ ํ”ผ๊ทธ(Pig)์™€ ํ•˜์ด๋ธŒ(Hive)๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ”ผ๊ทธ๋Š” ์•ผํ›„์—์„œ ๊ฐœ๋ฐœ๋˜์—ˆ๋Š”๋ฐ ํ˜„์žฌ๋Š” ํ•˜๋‘ก ํ”„๋กœ์ ํŠธ์— ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ”ผ๊ทธ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ ์žฌยท๋ณ€ํ™˜ํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ์ •๋ ฌํ•˜๋Š” ๊ณผ์ •์„ ์‰ฝ๊ฒŒ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๋งŒ๋“  ํ”„๋กœ๊ทธ๋žจ ์–ธ์–ด์ž…๋‹ˆ๋‹ค. ํ•˜์ด๋ธŒ๋Š” ํ•˜๋‘ก์„ ๋ฐ์ดํ„ฐ์›จ์–ดํ•˜์šฐ์Šค(DW)๋กœ ์šด์˜ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ์†”๋ฃจ์…˜์ด๋‹ค ํŒจ์ด์Šค๋ถ์—์„œ ๊ฐœ๋ฐœํ•œ ํ•˜์ด๋ธŒ๋Š” ๊ด€๊ณ„ํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์‚ฌ์šฉํ•˜๋Š” SQL๊ณผ ์œ ์‚ฌํ•œ ์งˆ์˜ ์–ธ์–ด(querry language)์˜ ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋งต๋ฆฌ๋“€์Šค๋Š” ์‰ฝ๊ฒŒ ๋งํ•˜๋ฉด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ์‹œ์ผœ ์ฒ˜๋ฆฌํ•œ ๋’ค ํ•˜๋‚˜๋กœ ํ•ฉ์น˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.

ํ•˜๋‘ก์€ MPP์™€ ๋‹ฌ๋ฆฌ ์‚ฌ์šฉ์ด ํŽธ๋ฆฌํ•˜๋‹ค - ๊ฐœ๋ฐœ์ž๋“ค์ด ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ์‹œํ‚ค๊ณ  ํ•ฉ์น˜๋Š” ์ผ์„ ํ•  ํ•„์š” ์—†์ด ํ•˜๋‘ก์˜ ๋งต๋ฆฌ๋“€์‹ฑ ๊ธฐ์ˆ ์ด ์ด๋ฅผ ์ž๋™์ ์œผ๋กœ ์ง€์›

โ€˜ํ•˜๋‘ก ์—์ฝ”์‹œ์Šคํ…œโ€™์ด ๋“ฑ์žฅ

๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋Š” NoSQL, ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์ธ HBase, SQL๊ณผ ๋น„์Šทํ•œ ์ฟผ๋ฆฌ๋กœ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” Hive์™€ Pig, ๊ทธ๋ฆฌ๊ณ  ๋ถ„์‚ฐ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ์ด ๊ฐ€๋Šฅํ•œ Zookeeper ๋“ฑ์ด ํฌํ•จ๋์Šต๋‹ˆ๋‹ค.

๋น„์‹ผ ์žฅ๋น„๋ฅผ ๋„์ž…ํ•ด ๋น…๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•  ํ•„์š”๊ฐ€ ์—†๋Š” ์ค‘์†Œ๊ธฐ์—…์„ ์ค‘์‹ฌ์œผ๋กœ ํ•˜๋‘ก ์‚ฌ์šฉ์ด ์ฆ๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค

IBM, ์˜ค๋ผํด, ํ…Œ๋ผ๋ฐ์ดํƒ€ ๊ฐ™์€ ๋ถ„์„ ์ „๋ฌธ ์†”๋ฃจ์…˜ ์—…์ฒด๋“ค์€ ๋น…๋ฐ์ดํ„ฐ ๋ถ„์„์— ์†”๋ฃจ์…˜ ์‚ฌ์šฉ ๋น„์šฉ, ์„œ๋ฒ„ ๋น„์šฉ ๋“ฑ ์—„์ฒญ๋‚œ ์ดˆ๊ธฐ ์ž๋ณธ๊ธˆ์„ ์š”๊ตฌํ–ˆ์Šต๋‹ˆ๋‹ค

ํ์‡„์ ์œผ๋กœ ์†Œ์Šค๋ฅผ ์ œ๊ณตํ•ด ํ•œ๋ฒˆ ๋„์ž…ํ•œ ์ดํ›„์—๋Š” ๋‹ค๋ฅธ ๋Œ€์•ˆ์œผ๋กœ ์˜ฎ๊ฒจ๊ฐ€๊ธฐ ์‰ฝ์ง€ ์•Š๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค

๋น…๋ฐ์ดํ„ฐ๋ฅผ ์‰ฝ๊ณ , ๊ฐ„ํŽธํ•˜๊ณ , ํŽธ๋ฆฌํ•˜๊ณ , ๋น ๋ฅด๊ฒŒ ๋ถ„์„ํ•  ๋งŒํ•œ ๊ธฐ์ˆ ๋กœ ํ•˜๋‘ก ๋งŒํ•œ ๊ฒŒ ์—†์Šต๋‹ˆ๋‹ค

ํ•˜๋‘ก์„ ํ†ตํ•ด ๋น…๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉด ์•ˆ์ •์„ฑ ๋ฉด์—์„œ ์ผ๋ฐ˜์ ์ธ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์ด์šฉํ•œ ๋ถ„์„๋ณด๋‹ค ๋–จ์–ด์ง‘๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๊ฐ€ 99.9999%์˜ ๊ณ ๊ฐ€์šฉ์„ฑ์„ ์ž๋ž‘ํ•œ๋‹ค๋ฉด, ํ•˜๋‘ก์€ 99.99%์˜ ๊ณ ๊ฐ€์šฉ์„ฑ์„ ์ž๋ž‘ํ•ฉ๋‹ˆ๋‹ค. ์†Œ์ˆ˜์  ์ฐจ์ด์ง€๋งŒ ์ด ์ฐจ์ด๋Š” ๊ธˆ์œต๊ถŒ ๋“ฑ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋ถ„์„ํ•˜๋Š” ๊ธฐ์—…๋“ค์—๊ฒŒ๋Š” ๊ฝค ์ค‘์š”ํ•œ ์˜๋ฏธ๋กœ ๋‹ค๊ฐ€๊ฐ‘๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์—…์ฒด๋“ค์€ ํ•˜๋‘ก์„ ์™ธ๋ฉดํ•˜๊ธฐ ์ด์ „์— ํ•˜๋‘ก์„ ํ’ˆ์–ด ์ƒˆ๋กญ๊ฒŒ ๊ณ ๊ฐ๋“ค์€ ๊ณต๋žตํ•˜๊ธฐ๋กœ ๋งˆ์Œ์„ ๋ฐ”๊ฟจ์Šต๋‹ˆ๋‹ค.

ํ•˜๋‘ก์€ IBM DB2, EMC ๊ทธ๋ฆฐํ”Œ๋Ÿผ, ์˜ค๋ผํด ๋น…๋ฐ์ดํ„ฐ ์–ดํ”Œ๋ผ์ด์–ธ์Šค ๋“ฑ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์—…์ฒด๋“ค์ด ์ถœ์‹œํ•˜๋Š” ๊ฑฐ์˜ ๋ชจ๋“  ์ œํ’ˆ์— ์ ์šฉ๋์Šต๋‹ˆ๋‹ค.


๊ฐœ์š” โ€“ ํ•˜๋‘ก์„ ์“ฐ๊ฒŒ ๋œ ์ด์œ 

: 10TBํฌ๊ธฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์†ŒํŒ…ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•˜์ž. ์ด๋ฅผ ํ•œ ๋Œ€์˜ ์„œ๋ฒ„์—์„œ ์†ŒํŒ…ํ•˜๋ ค๋ฉด 2~3์ผ์ •๋„ ๊ฑธ๋ฆด๊ฒƒ์ด๋‹ค. ์ด๋Š” ์„œ๋ฒ„ ํ•œ๋Œ€๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์—†๋Š” ๊ทœ๋ชจ์ด๋‹ค. ๋”ฐ๋ผ์„œ 100๋Œ€ ์„œ๋ฒ„๋ฅผ ๊ฐ€์ง€๊ณ  ํ•˜๋‘ก ํด๋Ÿฌ์Šคํ„ฐ๋กœ ์ฒ˜๋ฆฌํ•˜๋ฉด 30~40๋ถ„์ด๋ฉด ์ถฉ๋ถ„ํ•˜๋‹ค. ์ฆ‰, ํ•˜๋‘ก์€ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ๋ถ„์‚ฐ์ฒ˜๋ฆฌ ํ”„๋ ˆ์ž„์›์ด๋‹ค. 


Q. ์™œ ๊ตณ์ด ํ•˜๋‘ก ํด๋Ÿฌ์Šคํ„ฐ๋กœ ์ฒ˜๋ฆฌํ•˜๋‚˜? ๊ธฐ์กด์˜ Oracle๊ฐ™์€ DBMS๋ถ„์‚ฐ์ฒ˜๋ฆฌ๋กœ๋Š” ์•ˆ๋˜๋‚˜?

์˜ค๋ผํด ๊ฐ™์€ RDBMS๋Š” ๋ถ„์‚ฐ ํ™˜๊ฒฝ์„ ์—ผ๋‘์— ๋‘์ง€ ์•Š๊ณ  ํ•œ ๋Œ€์˜ ์„œ๋ฒ„๋งŒ์„ ์ƒ๊ฐํ•ด์„œ ๋งŒ๋“ค์–ด์ง„ ์†Œํ”„ํŠธ์›จ์–ด์ด๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ์ด๋Ÿฐ ์†Œํ”„ํŠธ์›จ์–ด๋“ค์€ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์šฉ๋Ÿ‰์„ ๋Š˜๋ฆด๋ ค๋ฉด ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค๋˜์ง€ CPU๋‚˜ ๋””์Šคํฌ๋ฅผ ๋” ์žฅ์ฐฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์„œ๋ฒ„์˜ ๋ฆฌ์†Œ์Šค๋ฅผ ์ถ”๊ฐ€ํ•ด์•ผ ํ•œ๋‹ค. ์ด๋Ÿฐ ๋ฐฉ์‹์„ ์Šค์ผ€์ผ์—…(Scale-Up)์ด๋ผ๊ณ  ํ•œ๋‹ค. 


Q. RDBMS๋“ค์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ธฐ๋ฒ•์ด ์Šค์ผ€์ผ์—…์ด๋ผ๋ฉด ํ•˜๋‘ก์€?

ํ•˜๋‚˜์˜ ์„œ๋ฒ„์— ๋” ๋งŽ์€ ๋ฆฌ์†Œ์Šค๋ฅผ ๋ถ™์—ฌ์„œ ์šฉ๋Ÿ‰์„ ํ‚ค์šฐ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์„œ๋ฒ„ ์ž์ฒด๋ฅผ ๋” ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์ „์ฒด ์‹œ์Šคํ…œ์˜ ์šฉ๋Ÿ‰์„ ํ‚ค์šฐ๋Š” ๋ฐฉ์‹์„ ์Šค์ผ€์ผ์•„์›ƒ(Scale-out) ์ด๋ผํ•œ๋‹ค. NoSQL์ด๋‚˜ ํ•˜๋‘ก ๋“ฑ์˜ ๋ถ„์‚ฐ ํ™˜๊ฒฝ ์‹œ์Šคํ…œ์—์„œ ์‹œ์Šคํ…œ์˜ ์šฉ๋Ÿ‰์„ ์ฆ๋Œ€์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค. ์Šค์ผ€์ผ์—…์ฒ˜๋Ÿผ ๊ณ ๊ฐ€์˜ ์žฅ๋น„๋ณด๋‹ค๋Š” ์ €๊ฐ€์˜ ์žฅ๋น„๋ฅผ ์—ฌ๋Ÿฌ ๋Œ€ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์„ ์ฃผ๋กœ ํƒํ•œ๋‹ค. 


๋น…๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ์˜ ์ „์ฒด์ ์ธ ๊ตฌ์„ฑ

hadoop_img_1

๋น…๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๊ณ , ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” HDFS(Hadoop Distributed File System)๊ณผ MapReduce๊ฐ€ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ธฐ ์œ„ํ•œ ๋„๊ตฌ์ธ NoSQL๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

Q. R์ด๋ž€๊ฑธ ๋“ค์–ด๋ดค๋Š”๋ฐ R์€ ๊ทธ๋Ÿผ ๋ฌด์—‡์ธ๊ฐ€์š”?

 R์€ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” ๋ชจ๋“ˆ์ด๋‹ค. ์ฒ˜๋ฆฌ ๋ชจ๋“ˆ์„ ํ†ตํ•ด ๋ถ„์„๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐํ™” ํ•ด์ฃผ๊ธฐ ์œ„ํ•œ ๋ชจ๋“ˆ์ด๋‹ค. ์ฃผ๋กœ ์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ฆฌ๊ฑฐ๋‚˜ ํ•ด์„œ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์„ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค. 

๋ฐ์ดํ„ฐ ์ €์žฅ/์ฒ˜๋ฆฌ ๋ชจ๋“ˆ (HDFS & MapReduce)

๋น…๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ์ฒ˜๋ฆฌํ•˜๋Š” ์—ญํ• ์„ ๋‹ด๋‹นํ•˜๋Š” ๊ฒƒ์ด ๋น…๋ฐ์ดํ„ฐ ์‹œ์Šคํ…œ์˜ ํ•ต์‹ฌ ๋ธŒ๋ ˆ์ธ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š” ํ•˜๋‘ก์ด๋‹ค.

ํ•˜๋‘ก์€ ํฌ๊ฒŒ ๋‘๊ฐ€์ง€ ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.


HDFS โ€“ ํ•˜๋‘ก ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ

MapReduce โ€“ ๋ถ„์‚ฐ ์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ


์•ž์„œ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋“ค์ด HDFS์— ์ €์žฅ๋˜๊ณ  HDFS์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋“ค์€ MapReduce ๋ฅผ ํ†ตํ•ด ์ฒ˜๋ฆฌ๋œ๋‹ค.

ํ•˜๋‘ก์€ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ์˜ ๋ฐฐ์น˜ ํ”„๋กœ์„ธ์‹ฑ์„ ์œ„ํ•œ ๋ชจ๋“ˆ์ด๋‹ค.

์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ๋ถ„์„์šฉ๋„๊ฐ€ ์•„๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ•˜๋‘ก์—์„œ ๋Œ€์šฉ๋Ÿ‰ ๋ฐฐ์น˜ ํ”„๋กœ์„ธ์‹ฑ ํ›„ ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๋ถ„์„์šฉ ํˆด์„ ์ด์šฉํ•˜์—ฌ์•ผ ํ•œ๋‹ค.

๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐฉ๋ฒ•์—๋Š” ๊ธฐ์กด ๊ด€๊ณ„ํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ํ•˜๋‘ก์œผ๋กœ ์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ์ž‘๋‹ค๋ฉด RDBMS์— ์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์€ ํ›„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ๊ธฐ๋Šฅ์„ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•œ๋‹ค.


NoSQL

์ผ๋ฐ˜์ ์œผ๋กœ ๋น…๋ฐ์ดํ„ฐ์—์„œ ํ•˜๋‘ก์œผ๋กœ ์ฒ˜๋ฆฌ๋˜์–ด ๋งŒ๋“ค์–ด์ง„ ๋ฐ์ดํ„ฐ๋Š” ๊ทธ ํฌ๊ธฐ๊ฐ€ ๊ต‰์žฅํžˆ ํฌ๋ฉฐ ์Šคํ‚ค๋งˆ๊ฐ€ ๊ณ ์ •๋˜์–ด์žˆ์ง€ ์•Š๋Š” ๋“ฑ ๊ธฐ์กด์˜ ๊ด€๊ณ„ํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค.

NoSQL์ด ์˜๋ฏธํ•˜๋“ฏ์ด No Structured Query Language ์ฆ‰, ์ •ํ˜•ํ™”๋œ RDBMS์˜ ํ…Œ์ด๋ธ” ํ˜•์‹์ด ์•„๋‹Œ ๋ฐ์ดํ„ฐ ํ˜•์‹์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ CRUDํ•˜๊ณ  ์‹ถ์„ ๊ฒฝ์šฐ NoSQL์„ ์‚ฌ์šฉํ•œ๋‹ค.

๋ถ„์‚ฐํ™˜๊ฒฝ์„ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ๋‹ค.

๊ธฐ์กด์˜ RDBMS๋ณด๋‹ค ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์–‘์ด๋‚˜ ํŠธ๋ž˜ํ”ฝ์ด ํ›จ์”ฌ ๊ฑฐ๋Œ€ํ•˜๋‹ค.

MongoDB, HBase, Cassandra ๋“ฑ์ด ์žˆ๋‹ค.

์ฃผ๋กœ ๋น…๋ฐ์ดํ„ฐ ์‹œ์Šคํ…œ ์ฒ˜๋ฆฌ/๋ถ„์„์„ ์œ„ํ•ด ํ•˜๋‘ก๊ณผ NoSQL์„ ์‚ฌ์šฉํ•œ๋‹ค.

๊ฒ€์ƒ‰์—”์ง„ RDBMS๋‚˜ NoSQL์ฒ˜๋Ÿผ ํ‚ค/๋ฐธ๋ฅ˜์˜ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ๋ณด๋‹ค ๋” ๋ณต์žกํ•œ ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ์— ์•ก์„ธ์Šคํ•˜์—ฌ์•ผ ํ•  ๊ฒฝ์šฐ ์‚ฌ์šฉํ•œ๋‹ค.

๊ฒ€์ƒ‰์—”์ง„ ์˜คํ”ˆ์†Œ์Šค ํ”„๋กœ์ ํŠธ๋กœ๋Š” Lucene, Solr, ElasticSearch๊ฐ€ ์žˆ๋‹ค.


ํ•˜๋‘ก์„ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์— ํ™œ์šฉํ•œ ๋Œ€ํ‘œ์ ์ธ ์‚ฌ๋ก€๋กœ๋Š” ใ€Š๋‰ด์š•ํƒ€์ž„์Šคใ€‹๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ใ€Š๋‰ด์š•ํƒ€์ž„์Šคใ€‹๋Š” 1851๋…„๋ถ€ํ„ฐ 1980๋…„๊นŒ์ง€์˜ ๊ธฐ์‚ฌ 1100๋งŒ ๊ฑด์„ PDF๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋Œ€๊ทœ๋ชจ ํ”„๋กœ์ ํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ ํ•˜๋“œ์›จ์–ด์™€ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ์‹ ๊ทœ๋กœ ๊ตฌ๋งคํ•˜๋Š” ๋Œ€์‹  ์•„๋งˆ์กด EC2์™€ S3, ๊ทธ๋ฆฌ๊ณ  ํ•˜๋‘ก(Hadoop) ํ”Œ๋žซํผ์„ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ•˜๋ฃจ๋งŒ์— ์ž‘์—…์„ ์™„๋ฃŒํ•˜๊ณ  ์ง€๋ถˆํ•œ ๋น„์šฉ์€ 1450๋‹ฌ๋Ÿฌ์— ๋ถˆ๊ณผํ–ˆ์Šต๋‹ˆ๋‹ค

ํ•˜๋‘ก์€ ๊ณต๊ฐœ์šฉ ์†Œํ”„ํŠธ์›จ์–ด์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ฌด๋ฃŒ๋กœ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

ํ˜„์žฌ ์ธํ„ฐ๋„ท ํ™˜๊ฒฝ์—์„œ ์˜คํ”ˆ์†Œ์Šค๋กœ ์ œ๊ณต๋˜๋Š” ๊ฐœ๋ฐœ ๋„๊ตฌ๋กœ๋Š” LAMP(Linux, Apache, MySQL, PHP/Python)๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์šด์˜์ฒด์ œ์ธ ๋ฆฌ๋ˆ…์Šค(Linux), ์›น ์„œ๋ฒ„์ธ ์•„ํŒŒ์น˜(Apache), ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋Š” MySQL, ๊ฐœ๋ฐœ์–ธ์–ด์ธ PHP/Python์„ ์‚ฌ์šฉํ•˜๋ฉด ์ €๋ ดํ•œ ๋น„์šฉ์œผ๋กœ ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค


* ์„ค์น˜

1.VMware 12 ์„ค์น˜

2.C:\hadoop์— ํด๋” CentOS_master ์ƒ์„ฑ

3.๋ฆฌ๋ˆ…์Šค ์„ค์น˜

4.VMware Tool

5.JDK

http://java.oracle.com

๊ฐ€. hadoop ๊ณ„์ • ๋กœ๊ทธ์ธ

๋‚˜. Firefox์—ด๊ณ  JDK ๋‹ค์šด๋กœ๋“œ - ์ €์žฅ

๋‹ค. /home/hadoop/๋‹ค์šด๋กœ๋“œ ํด๋”์— ์ €์žฅ๋œ๋‹ค

๋ผ. JDK ์„ค์น˜๋Š” ์••์ถ•๋งŒ ํ’€๋ฉด ๋ฉ๋‹ˆ๋‹ค

root๊ณ„์ •์œผ๋กœ ์••์ถ•ํ’€๊ณ  hadoop ์‚ฌ์šฉ์ž์—๊ฒŒ ์ฝ๊ธฐ ๊ถŒํ•œ์„ ์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค

[hadoop@localhost ๋ฐ”ํƒ•ํ™”๋ฉด]$ su -
์•”ํ˜ธ: 
[root@localhost ~]# cd /usr/local 
[root@localhost local]# pwd
/usr/local
[root@localhost local]# tar -xvf /home/hadoop/๋‹ค์šด๋กœ๋“œ/jdk-8u15         1-linux-x64.tar.gz
[root@localhost local]# ls -l /usr/local - ์„ค์น˜๋˜์—ˆ๋‚˜ ํ™•์ธ
[root@localhost local]# chown -R hadoop:hadoop /usr/local/jdk1.8.0_111/


6.Eclipse

http://www.eclipse.org

๊ฐ€. hadoop ๊ณ„์ • ๋กœ๊ทธ์ธ

๋‚˜. Firefox์—ด๊ณ  Eclipse ๋‹ค์šด๋กœ๋“œ - ์ €์žฅ

๋‹ค. /home/hadoop/๋‹ค์šด๋กœ๋“œ ํด๋”์— ์ €์žฅ๋œ๋‹ค

๋ผ. Eclipse ์„ค์น˜๋Š” ์••์ถ•๋งŒ ํ’€๋ฉด ๋ฉ๋‹ˆ๋‹ค

root๊ณ„์ •์œผ๋กœ ์••์ถ•ํ’€๊ณ  hadoop ์‚ฌ์šฉ์ž์—๊ฒŒ ์ฝ๊ธฐ ๊ถŒํ•œ์„ ์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค

[root@localhost local]# pwd
/usr/local
[root@localhost local]# ls -l /home/hadoop/๋‹ค์šด๋กœ๋“œ
[root@localhost local]# tar -xvf /home/hadoop/๋‹ค์šด๋กœ๋“œ/eclipse-java-neon-2-linux-gtk-x86_64.tar.gz
[root@localhost local]# ls -l /usr/local - ์„ค์น˜๋˜์—ˆ๋‚˜ ํ™•์ธ
[root@localhost local]# ls -l ./eclipse/
[root@localhost local]# chown -R hadoop:hadoop /usr/local/eclipse/


7.Hadoop

http://archive.apache.org/dist/hadoop/common/

๊ฐ€. hadoop ๊ณ„์ • ๋กœ๊ทธ์ธ

๋‚˜. Firefox์—ด๊ณ  hadoop ๋‹ค์šด๋กœ๋“œ - ์ €์žฅ

๋‹ค. /home/hadoop/๋‹ค์šด๋กœ๋“œ ํด๋”์— ์ €์žฅ๋œ๋‹ค

๋ผ. hadoop ์„ค์น˜๋Š” ์••์ถ•๋งŒ ํ’€๋ฉด ๋ฉ๋‹ˆ๋‹ค

root๊ณ„์ •์œผ๋กœ ์••์ถ• ํ’€๊ณ  hadoop ์‚ฌ์šฉ์ž์—๊ฒŒ ์ฝ๊ธฐ ๊ถŒํ•œ์„ ์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค

[root@localhost local]# pwd
/usr/local
[root@localhost local]# ls -l /home/hadoop/๋‹ค์šด๋กœ๋“œ
[root@localhost local]# tar -xvf /home/hadoop/๋‹ค์šด๋กœ๋“œ/hadoop-2.7.3.tar.gz
[root@localhost local]# ls -l ./hadoop-2.7.3/
[root@localhost local]# chown -R hadoop:hadoop hadoop-2.7.30/


8.CentOS OpenJDK ์‚ญ์ œ

๊ด€๋ฆฌ์ž ๋ชจ๋“œ์—์„œ ์‹คํ–‰ํ•ด์•ผ ํ•œ๋‹ค

[root@localhost local]# pwd
/usr/local
[root@localhost local]# java -version

๊ฐ€. ์„ค์น˜๋˜์–ด ์žˆ๋Š” OpenJDK ํŒŒ์ผ๋ช…์ด ๋‚˜์˜จ๋‹ค

 \# rpm -qa | grep jdk

java-1.7.0-openjdk-1.7.0.51-2.4.4.1.el6_5.x86_64

๋‚˜. ๋‚˜ํƒ€๋‚˜๋Š” ํŒŒ์ผ๋ช…์„ ๋ณต์‚ฌ ํ›„

# yum remove java-1.7.0-openjdk-1.7.0.51-2.4.4.1.el6_5.x86_64

๋‹ค. OpenJDK๋ฅผ ๋‹ค ์ง€์šฐ๋ฉด ์—๋Ÿฌ๊ฐ€ ๋œฌ๋‹ค

[root@localhost local]# java -version
-bash: /usr/bin/java: ๊ทธ๋Ÿฐ ํŒŒ์ผ์ด๋‚˜ ๋””๋ ‰ํ„ฐ๋ฆฌ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค

๊ทธ๋Ÿฌ๋ฉด JDK, ํ•˜๋‘ก ๊ฒฝ๋กœ๋ฅผ ๋“ฑ๋กํ•˜๊ธฐ์œ„ํ•ด .bash_profile์— ๋‚ด์šฉ์„ ์ถ”๊ฐ€ํ•ด์•ผ ํ•œ๋‹ค


9.bashrc ํ™˜๊ฒฝ์„ค์ •

.bash_profile ํŒŒ์ผ์€ ๋ชจ๋“œ ๋…ธ๋“œ์— ๋™์ผํ•˜๊ฒŒ ์„ค์ •๋˜์–ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

[hadoop@localhost ๋ฐ”ํƒ•ํ™”๋ฉด]$ cd
[hadoop@localhost ~]$ pwd
/home/hadoop
[hadoop@localhost ~]$ java -version
bash: java: command not found
[hadoop@localhost ~]$ cat .bash_profile
[hadoop@localhost ~]$ vi .bash_profile ( I๋Š” ์ž…๋ ฅ ) (esc -> : -> q! or wq)
export PATH=$PATH:$HOME/bin
export JAVA_HOME=/usr/local/jdk1.8.0_121
export HADOOP_INSTALL=/usr/local/hadoop-2.7.4
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_INSTALL/bin

๊ตฌ๋ถ„์ž :

[hadoop@localhost ~]$ source .bash_profile
[hadoop@localhost ~]$ hadoop version
[hadoop@localhost ~]$ java -version
[hadoop@localhost ~]$ javac -version


10.ipํ™•์ธ

/root์—์„œ /etc/hosts ์„ค์ •

๋‘ ๋Œ€์˜ ํ˜ธ์ŠคํŠธ๋กœ ํ…Œ์ŠคํŠธ ํ•  ์ˆ˜ ์žˆ๋‹ค

๋จผ์ € ์ž์‹ ์˜ IP๋ถ€ํ„ฐ ํ™•์ธ ํ•œ๋‹ค

[hadoop@localhost ๋ฐ”ํƒ•ํ™”๋ฉด]$ su -
์•”ํ˜ธ:
[root@localhost ~]# ifconfig
[root@localhost ~]# vi /etc/hosts:
[root@localhost ~]#
192.168.121.133 master
192.168.111.128 backup
192.168.111.129 slave1
192.168.111.128 slave2

๋‘ ๋Œ€์˜ ํ˜ธ์ŠคํŠธ๋กœ ํ…Œ์ŠคํŠธํ•œ๋‹ค

ํ˜ธ์ŠคํŠธ 1 : master, backup, slave2

ํ˜ธ์ŠคํŠธ 2 : slave1

[root@master ๋ฐ”ํƒ•ํ™”๋ฉด]$ ping slave1


11.๋ฐฉํ™”๋ฒฝ ์„ค์ • โ€“ iptables

[hadoop@master ๋ฐ”ํƒ•ํ™”๋ฉด]$ su - 
[root@master ~]# vi /etc/sysconfig/iptables         
\# Firewall configuration written by system-config-firewall
\# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
โ€‹                     /16-> 192.168๋งŒ ๊ณ ์ •ํ•˜๊ฒ ๋‹ค.
-A INPUT -s 192.168.0.0/16 -d 192.168.0.0/16 -j ACCEPT 
-A OUTPUT -s 192.168.0.0/16 -d 192.168.0.0/16 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited

COMMIT

โ€‹              ๋ฐฉํ™”๋ฒฝ ๊ณ ์ณค์œผ๋‹ˆ ์žฌ์‹œ์ž‘
[root@master ~]# service iptables restart       iptables ์ ์šฉ
iptables: ์ฒด์ธ์„ ACCEPT ๊ทœ์น™์œผ๋กœ ์„ค์ • ์ค‘ : filter    [ OK ]
iptables: ๋ฐฉํ™”๋ฒฝ ๊ทœ์น™์„ ์ง€์›๋‹ˆ๋‹ค:            [ OK ]
iptables: ๋ชจ๋“ˆ์„ ์–ธ๋กœ๋“œํ•˜๋Š” ์ค‘:             [ OK ]
iptables: ๋ฐฉํ™”๋ฒฝ ๊ทœ์น™ ์ ์šฉ ์ค‘:              [ OK ]
[root@master ~]# ping slave1


12.ssh rsaํ‚ค๋ฅผ ์ด์šฉํ•˜์—ฌ ๋น„๋ฐ€๋ฒˆํ˜ธ ์ž…๋ ฅ ์—†์ด ๋กœ๊ทธ์ธํ•˜๊ธฐ

์„œ๋กœ ๊ฐ„์— ๊ฒ€์ฆ๋œ ํ‚ค(authorized_key, ์•”ํ˜ธ)๋ฅผ ๋ฏธ๋ฆฌ ์ฃผ๊ณ ๋ฐ›์•„ ssh ์ ‘์† ์‹œ ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์ž…๋ ฅํ•˜์ง€ ์•Š๊ณ  ๋ฐ”๋กœ ์ ‘์†ํ•˜๋Š” ๋ฐฉ์‹์„ ๋งํ•œ๋‹ค

๋˜ํ•œ authorized_key๋ฅผ ์ฃผ๊ณ ๋ฐ›์ง€ ์•Š์€ ๋‹ค๋ฅธ ๊ณ„์ •์€ ์ ‘์†ํ•˜์ง€ ๋ชปํ•˜๋„๋ก ํ•˜์—ฌ ๋ณด์•ˆ์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค

authorized_key๋Š” ๊ถŒํ•œ์„ 644๋กœ ์ˆ˜์ • ํ•œ๋‹ค

๋ชจ๋“  ๋…ธ๋“œ์— ์ž‘์„ฑ ํ•œ๋‹ค

๊ฐ€. ๋จผ์ € ๊ณต๊ฐœํ‚ค๋ฅผ ์ƒ์„ฑํ•œ๋‹ค

[hadoop@master ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
c2:69:3b:36:4d:4f:bc:18:df:3f:b0:00:02:f1:38:25 hadoop@master
The key's randomart image is:
+--[ RSA 2048]----+
|  E..     |
|  .=      |
|  o..     |
|   o....    |
|   =.S.o   |
|   . = *.o.  |
|   = o +..o  |
|   . o  ... |
|       .. |
+-----------------+

๋‚˜. .ssh๊ฐ€ ์žˆ๋‚˜ ํ™•์ธ

[hadoop@master ๋ฐ”ํƒ•ํ™”๋ฉด]$ cd /home/hadoop
[hadoop@master ~]$ ls -la (.ssh ๋งŒ ์žˆ๋‚˜ ํ™•์ธ)
[hadoop@master ~]$ cd .ssh
[hadoop@master .ssh]$ ls -l
ํ•ฉ๊ณ„ 12
-rw-------. 1 hadoop hadoop 1675 2014-12-23 11:46 id_rsa //๊ฐœ์ธํ‚ค
-rw-r--r--. 1 hadoop hadoop 394 2014-12-23 11:46 id_rsa.pub //๊ณต๊ฐœํ‚ค

๋‹ค. master ๊ณต๊ฐœํ‚ค๋ฅผ authorized_keys์— ์ถ”๊ฐ€

[hadoop@master .ssh]$ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
[hadoop@master .ssh]$ ls -l
ํ•ฉ๊ณ„ 16
-rw-r--r--. 1 hadoop hadoop 394 2014-12-23 11:50 authorized_keys
-rw-------. 1 hadoop hadoop 1675 2014-12-23 11:46 id_rsa
-rw-r--r--. 1 hadoop hadoop 394 2014-12-23 11:46 id_rsa.pub

๋ผ. slave1์—์„œ๋„ ๊ฐœ์ธํ‚ค/๊ณต๊ฐœํ‚ค ์ƒ์„ฑ

[hadoop@slave1 ~]$ ssh-keygen -t rsa

๋งˆ. slave1 ๊ณต๊ฐœํ‚ค๋ฅผ master์˜ authorized_keys ํŒŒ์ผ์— ์ถ”๊ฐ€

backup, slave2์˜ ๊ณต๊ฐœํ‚ค๋„ ์ถ”๊ฐ€ํ•œ๋‹ค

[hadoop@master .ssh]$ ssh hadoop@slave1 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
The authenticity of host 'slave1 (192.168.164.131)' can't be established.
RSA key fingerprint is 8e:6b:98:d8:cd:c2:a4:00:25:ea:32:28:02:76:ba:b9.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave1,192.168.164.131' (RSA) to the list of known hosts.
hadoop@slave1's password:
[hadoop@master .ssh]$ ssh hadoop@backup cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@master .ssh]$ ssh hadoop@slave2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

๋ฐ”. ๋ชจ๋“  node์— ๊ณต๊ฐœํ‚ค ์žฌ๋ถ„๋ฐฐ ํ•œ๋‹ค

๋ชจ๋“  ๋…ธ๋“œ์—์„œ ์„œ๋กœ์˜ ๊ณต๊ฐœํ‚ค๋ฅผ ๊ณต์œ ํ•œ๋‹ค.

[hadoop@master .ssh]$ scp authorized_keys hadoop@slave1:~/.ssh/
[hadoop@master .ssh]$ scp authorized_keys hadoop@backup:~/.ssh/
[hadoop@master .ssh]$ scp authorized_keys hadoop@slave2:~/.ssh/
[hadoop@master .ssh]$ ssh-add //master์—์„œ๋งŒ ํ•ด์ฃผ๋ฉด ๋œ๋‹ค
Identity added: /home/hadoop/.ssh/id_rsa (/home/hadoop/.ssh/id_
rsa)

์‚ฌ. ๊ถŒํ•œ์„ 644๋กœ ์ˆ˜์ •

[hadoop@master .ssh]$ ls -la /home/hadoop/
[hadoop@master .ssh]$ chmod 644 ~/.ssh/authorized_keys

[์‹ค์Šต]

[hadoop@master .ssh]$ ssh hadoop@master date
[hadoop@master .ssh]$ ssh hadoop@backup date
[hadoop@master .ssh]$ ssh hadoop@slave1 date
[hadoop@master .ssh]$ ssh hadoop@slave2 date

[hadoop@slave1 .ssh]$ ssh hadoop@master date
[hadoop@slave1 .ssh]$ ssh hadoop@backup date
[hadoop@slave1 .ssh]$ ssh hadoop@slave1 date
[hadoop@slave1 .ssh]$ ssh hadoop@slave2 date

[์‹ค์Šต]

โ€ป master์—์„œ slave1์œผ๋กœ ๋กœ๊ทธ์ธ

[hadoop@master ~]$ ssh slave1
Last login: Wed Feb 17 13:57:38 2016 from master
[hadoop@slave1 ~]$

โ€ป slave1์—์„œ master์œผ๋กœ ๋กœ๊ทธ์ธ

[hadoop@slave1 ~]$ ssh master
hadoop@master's password: 
Last login: Wed Feb 17 14:55:15 2016 from slave1
[hadoop@master ~]$ 


13.ํ•˜๋‘ก ํ™˜๊ฒฝ ์„ค์ • ํŒŒ์ผ ์ˆ˜์ •

ํ•˜๋‘ก์ด ์‹คํ–‰ํ•˜๋Š” ๋ชจ๋“  ํ”„๋กœ์„ธ์Šค์— ์ ์šฉ๋˜๋Š” ์‹œ์Šคํ…œ ํ™˜๊ฒฝ ๊ฐ’์„ ์„ค์ •

์ด ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•œ ํ›„ ๋‹ค๋ฅธ ์Šคํฌ๋ฆฝํŠธ๋“ค์ด ์‹คํ–‰๋œ๋‹ค.

์„ค์ • ํŒŒ์ผ ์œ„์น˜ : /usr/local/hadoop-2.7.3/etc/hadoop/

๊ฐ€. hadoop-env.sh

ํ•˜๋‘ก์—๊ฒŒ JDK์„ค์น˜ ๊ฒฝ๋กœ ๋“ฑ๋ก

[hadoop@master hadoop]$ cd /usr/local/hadoop-2.7.3/etc/hadoop/
[hadoop@master hadoop]$ ls โ€“l
[hadoop@master hadoop]$ vi hadoop-env.sh  //ํ™˜๊ฒฝ ํŒŒ์ผ
export JAVA_HOME=/usr/local/jdk1.8.0_121

๋งจ ๋งˆ์ง€๋ง‰์— ์ถ”๊ฐ€ํ•˜์„ธ์š”

export HADOOP_OPTS="$HADOOP_OPTS-Djava.library.path=/usr/local/hadoop-2.7.3/lib/native"

๋‚˜. masters

๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ๋ฅผ ์‹คํ–‰ํ•  ์„œ๋ฒ„ ๋“ฑ๋กํ•˜๋Š” ํŒŒ์ผ

ํ•œ ๋Œ€๋กœ ํ•˜๋ ค๋ฉด localhost๋กœ ์ง€์ •. (default๋กœ ์ง€์ •๋˜์–ด ์žˆ์Œ)

๋‹ค. slaves

๋ฐ์ดํ„ฐ ๋…ธ๋“œ๋ฅผ ์‹คํ–‰ํ•  ์„œ๋ฒ„ ์„ค์ •

ํ•œ ๋Œ€๋กœ ํ•˜๋ ค๋ฉด localhost ์ง€์ •. (default๋กœ ์ง€์ •๋˜์–ด ์žˆ์Œ)

๋ฐ์ดํ„ฐ ๋…ธ๋“œ๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ์ด๋ฉด ๋ผ์ธ๋‹จ์œ„๋กœ ์„œ๋ฒ„์ด๋ฆ„์„ ์„ค์ •ํ•˜๋ฉด ๋œ๋‹ค.

[hadoop@master hadoop]$ cat slaves
localhost
[hadoop@master hadoop]$ vi slaves
slave1
slave2

๋ผ. core-site.xml ํŒŒ์ผ ์ˆ˜์ •

๋กœ๊ทธํŒŒ์ผ, ๋„คํŠธ์›Œํฌ ํŠœ๋‹, I/O ํŠœ๋‹, ํŒŒ์ผ ์‹œ์Šคํ…œ ํŠœ๋‹, ์••์ถ• ๋“ฑ ํ•˜๋ถ€ ์‹œ์Šคํ…œ ์„ค์ •ํŒŒ์ผ

core-site.xml ํŒŒ์ผ์€ HDFS์™€ ๋งต๋ฆฌ๋“€์Šค์—์„œ ๊ณตํ†ต์ ์œผ๋กœ ์‚ฌ์šฉํ•  ํ™˜๊ฒฝ์ •๋ณด ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

hadoopcore-1.x.x.jar ํŒŒ์ผ์— ํฌํ•จ๋˜์–ด ์žˆ๋Š” core-default.xml์„ ์˜ค๋ฒ„๋ผ์ด๋“œ ํ•œ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

core-site.xml์— ์„ค์ • ๊ฐ’์ด ์—†์„ ๊ฒฝ์šฐ core-default.xml์— ์žˆ๋Š” ๊ธฐ๋ณธ ๊ฐ’์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค

๊ณตํ†ต ์†์„ฑ๋“ค์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์€ ๋‹ค์Œ ์ฃผ์†Œ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml

[hadoop@master hadoop]$ vi core-site.xml 
<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://master:9000</value>
    </property>
    <property>
         <name>hadoop.tmp.dir</name>
         <value>/usr/local/hadoop-2.7.3/tmp</value>
     </property>
</configuration>

โ€ป ํ•˜๋‘ก ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ(HDFS: Hadoop Distributed File System)

๋งˆ. hdfs-site.xml ์ˆ˜์ •

๋ฐ์ดํ„ฐ ์ €์žฅ ๊ฒฝ๋กœ ๋ณ€๊ฒฝ

hdfs-site.xml ํŒŒ์ผ์€ HDFS์—์„œ ์‚ฌ์šฉํ•  ํ™˜๊ฒฝ ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

hadoop-core-2.2.0.jar ํŒŒ์ผ์— ํฌํ•จ๋˜์–ด ์žˆ๋Š” hdfs-default.xml์„ ์˜ค๋ฒ„๋ผ์ด๋“œ ํ•œ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

hdfs-site.xml์— ์„ค์ • ๊ฐ’์ด ์—†์„ ๊ฒฝ์šฐ hdfs-default.xml์— ์žˆ๋Š” ๊ธฐ๋ณธ ๊ฐ’์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

HDFS ์†์„ฑ๋“ค์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์€ ๋‹ค์Œ ์ฃผ์†Œ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

[hadoop@master hadoop]$ vi hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
         <value>1</value> //๋ฐ์ดํ„ฐ๋ฅผ 1๊ฐœ๋งŒ ๋ณต์‚ฌ:๊ฐ€์ƒ๋ถ„์‚ฐ๋ชจ๋“œ, 3์ผ๊ฒฝ์šฐ:์™„์ „๋ถ„์‚ฐ๋ชจ๋“œ
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.namenode.http.address</name>
        <value>master:50070</value>
    </property>
    <property>
        <name>dfs.secondary.http.address</name>
        <value>backup:50090</value>
    </property>
</configuration>

๋ฐ”. mapred-site.xml ํŒŒ์ผ ์ˆ˜์ •

mapred-site.xml ํŒŒ์ผ์€ ๋งต๋ฆฌ๋“€์Šค์—์„œ ์‚ฌ์šฉํ•  ํ™˜๊ฒฝ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

hadoop-core-x.x.x.jar ํŒŒ์ผ์— ํฌํ•จ๋˜์–ด ์žˆ๋Š” mapred-default.xml์„ ์˜ค๋ฒ„๋ผ์ด๋“œ ํ•œ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

mapred-site.xml์— ์„ค์ • ๊ฐ’์ด ์—†์„ ๊ฒฝ์šฐ mapred-default.xml์— ์žˆ๋Š” ๊ธฐ๋ณธ ๊ฐ’์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋งŒ์•ฝ mapred-site.xml์ด ์กด์žฌํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ mapred-site.xml.template๋ฅผ ๋ณต์‚ฌํ•˜์—ฌ ์‚ฌ์šฉ

[hadoop@master hadoop]$ cp mapred-site.xml.template mapred-site.xml
[hadoop@master hadoop]$ vi mapred-site.xml
<configuration>
       <property>
              <name>mapreduce.framework.name</name>
              <value>yarn</value>
       </property>
</configuration>

์‚ฌ. yarn-site.xml

์ˆ˜์ • ์•ˆํ•จ, default์„ค์ • ๋”ฐ๋ฆ„, ๊ทธ๋Ÿฐ๋ฐ mapred-site.xml์—์„œ yarn์„ ์„ ํƒํ–ˆ์„ ๊ฒฝ์šฐ ๋‚ด์šฉ ์ถ”๊ฐ€

๋งต๋ฆฌ๋“€์Šค ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ์…”ํ”Œ ์„œ๋น„์Šค๋ฅผ ์ง€์ •ํ•œ๋‹ค

[hadoop@master hadoop]$ vi yarn-site.xml
<configuration>
       <property>
              <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
       </property>
       <property>
              <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
              <value>org.apache.hadoop.mapred.ShuffleHandler</value>
       </property>
</configuration>


14.๋‹ค๋ฅธ ๋…ธ๋“œ ์„ค์ •ํŒŒ์ผ ๋™๊ธฐํ™”

* rsync

์›๊ฒฉ ์„œ๋ฒ„์˜ ํŒŒ์ผ์„ ๋„คํŠธ์›Œํฌ๋ฅผ ๊ฑฐ์ณ์„œ ์ „์†กํ•˜๋Š” ์‹คํ–‰ ์†Œํ”„ํŠธ์›จ์–ด

๋‹ค๋ฅธ ๋…ธ๋“œ์—์„œ

# mkdir /usr/local/hadoop-2.7.3
# chown -R hadoop:hadoop /usr/local/hadoop-2.7.3

๋งˆ์Šคํ„ฐ ๋…ธ๋“œ์—์„œ

# cd /usr/local/hadoop-2.7.3
# rsync -av . hadoop@backup:/usr/local/hadoop-2.7.3
# rsync -av . hadoop@slave1:/usr/local/hadoop-2.7.3
# rsync -av . hadoop@slave2:/usr/local/hadoop-2.7.3
# cd /usr/local/hadoop-2.7.3/etc/hadoop
# rsync -av . hadoop@backup:/usr/local/hadoop-2.7.3/etc/hadoop
# rsync -av . hadoop@slave1:/usr/local/hadoop-2.7.3/etc/hadoop
# rsync -av . hadoop@slave2:/usr/local/hadoop-2.7.3/etc/hadoop


15. ๋„ค์ž„๋…ธ๋“œ ์ดˆ๊ธฐํ™”

์ปดํ“จํ„ฐ๋ฅผ ํ•˜๋“œ ๋””์Šคํฌ๋ฅผ ์‚ฌ์‹œ๊ฑฐ๋‚˜ USB๋ฅผ ์‚ฌ์‹œ๋ฉด ๊ทธ๊ฑธ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋จผ์ € format๋ฅผ ํ•ฉ๋‹ˆ๋‹ค.

ํ•˜๋“œ๋””์Šคํฌ ์ดˆ๊ธฐํ™”๋ผ๊ณ  ์ƒ๊ฐํ•˜์…”๋„ ๋˜๋Š”๋ฐ Hadoop ๋˜ํ•œ ๊ตฌ์„ฑ ์‹œ ์ฒ˜์Œ ํ•˜์‹ค ์ผ์€ formatํ•˜๋Š” ์ผ์ž…๋‹ˆ๋‹ค.

๋„ค์ž„๋…ธ๋“œ๋Š” ์ตœ์ดˆ ํ•œ๋ฒˆ๋งŒ ์‹คํ–‰ํ•˜๋ฉด ๋จ

์—๋Ÿฌ๋ฉ”์‹œ์ง€๊ฐ€ ์žˆ๋‹ค๋ฉด ํ™˜๊ฒฝ์„ค์ •ํŒŒ์ผ์ด ์ž˜๋ชป๋œ ๊ฒƒ์ž„.

ํ™•์ธํ•˜๊ณ  ์ˆ˜์ •ํ•œ ๋‹ค์Œ ๋‹ค์‹œ ์‹คํ–‰ ์‹œํ‚ฌ ๊ฒƒ

[hadoop@master ~]$ cd /usr/local/hadoop-2.7.3/bin
[hadoop@master bin]$
[hadoop@master bin]$ hdfs namenode -format

๊ฐ€. ํ”„๋กœ์„ธ์Šค ์‹คํ–‰

[hadoop@master sbin]$ pwd
/usr/local/hadoop-2.7.3/sbin
[hadoop@master sbin]$ ./start-dfs.sh
[hadoop@master sbin]$ ./start-yarn.sh
[hadoop@master sbin]$ jps
4147 DataNode   -----------------------> slave2
12373 NameNode
12703 SecondaryNameNode  ----------> backup
12851 ResourceManager
13451 Jps
9590 NodeManager
13392 JobHistoryServer
[hadoop@slave1 sbin]$ jps
6001 DataNode
6103 NodeManager
6350 Jps 

๋ธŒ๋ผ์šฐ์ €์—์„œ๋„ ํ™•์ธ

http://master:50070 ๋˜๋Š”

http://master:50070/dfshealth.html ์‹คํ–‰ํ›„ ํŒŒ์ผ ์‹œ์Šคํ…œ ์ƒํƒœ ๋ณด์—ฌ์•ผ ํ•จ

1.x์˜ JobTracker๋Š” http://master:8088/cluster ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ

์ฝ˜์†”์—์„œ๋„ ํ™•์ธ

[hadoop@master hadoop]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful
[hadoop@master sbin]$ hdfs dfsadmin โ€“report

โ˜… eclipse ์‹คํ–‰

[hadoop@master ~]$ mkdir jar

์ดํด๋ฆฝ์Šค ์‹คํ–‰

[hadoop@master eclipse]$ pwd
/usr/local/eclipse
[hadoop@master eclipse]$ ./eclipse
Workspace : /home/hadoop/workspace

[์‹ค์Šต]

Project : hadoop

โ˜… ํ•„์š”ํ•œ JAR

/usr/local/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
/usr/local/hadoop-2.7.3/share/hadoop/common/lib/commons-cli-1.2.jar
/usr/local/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-client-core.2.7.3.jar
/usr/local/hadoop-2.7.3/share/hadoop/mapreduce/lib/log4j.1.2.17.jar

โ˜… HDFS FS shell๋ช…๋ น์–ด

1 ํด๋”์ƒ์„ฑ : hadoop fs -mkdir [ํด๋”๋ช…]
2. ํŒŒ์ผ/ํด๋” ์‚ญ์ œ : hadoop fs -rm [ํŒŒ์ผ๋ช…]
โ€‹          hadoop fs -rm -r [ํด๋”๋ช…]
3. ํŒŒ์ผ ๋ณต์‚ฌ : hadoop fs -cp [ํด๋”๋ช…] [ํด๋”๋ช…/ํŒŒ์ผ๋ช…]
4. ๋ฆฌ์ŠคํŠธ ๋ณด๊ธฐ : hadoop fs -ls /
5. ํŒŒ์ผ ๋‚ด์šฉ ๋ณด๊ธฐ : hadoop fs*-cat [ํŒŒ์ผ๋ช…]
6. ํŒŒ์ผ ์˜ฌ๋ฆฌ๊ธฐ : hadoop fs-put [๋กœ์ปฌ๊ฒฝ๋กœํŒŒ์ผ๋ช…] [ํ•˜๋‘ก๊ฒฝ๋กœํŒŒ์ผ๋ช…]
ํ•˜๋‘ก๊ฒฝ๋กœ๋ฅผ ์ƒ๋žตํ•˜๋ฉด default ๋””๋ ‰ํ† ๋ฆฌ๊ฐ€ ๋œ๋‹ค
[hadoop@master ~]$ hadoop fs โ€“ls abc.txt
[hadoop@master ~]$ hadoop fs โ€“ls โ† /user/hadoo์˜ ์œ„์น˜
-rw-r--r--  1 hadoop supergroup     14 2017-01-15 23:02 abc.txt
[hadoop@master ~]$ hadoop fs โ€“ls /
drwxr-xr-x  - hadoop supergroup      0 2017-01-15 23:02 /user
[hadoop@master ~]$ hadoop fs -mkdir /test
[hadoop@master ~]]$ hadoop fs โ€“ls /
drwxr-xr-x  - hadoop supergroup     0 2017-01-16 00:43 /test
drwxr-xr-x  - hadoop supergroup     0 2017-01-15 23:02 /user

๋ฆฌ๋ˆ…์Šค ํŒŒ์ผ

[hadoop@master ~]$ vi apple.txt
[hadoop@master ~]$ hadoop fs โ€“put apple.txt โ† ๋ฆฌ๋ˆ…์Šค ํŒŒ์ผ์„ ํ•˜๋‘ก์— ์˜ฌ๋ฆฌ๊ธฐ
[hadoop@master ~]$ hadoop fs โ€“ls
-rw-r--r--  1 hadoop supergroup     38 2017-01-16 00:50 apple.txt
-rw-r--r--  1 hadoop supergroup     14 2017-01-15 23:02 abc.txt
[hadoop@master ~]$ hadoop fs โ€“put apple.txt /
[hadoop@master ~]$ hadoop fs -ls /
-rw-r--r--  1 hadoop supergroup     38 2017-01-16 00:50 /apple.txt
drwxr-xr-x  - hadoop supergroup     0 2017-01-16 00:43 /test
drwxr-xr-x  - hadoop supergroup     0 2017-01-15 23:02 /user
[hadoop@master ~]$ hadoop fs -ls /user/hadoop
-rw-r--r--  1 hadoop supergroup     18 2017-01-16 11:18 /user/hadoop/apple.txt
-rw-r--r--  1 hadoop supergroup     24 2017-01-16 10:50 /user/hadoop/abc.txt
[hadoop@master ~]$ hadoop fs -cat /apple.txt
[hadoop@master ~]$ hadoop fs -cat /user/hadoop/abc.txt
[hadoop@master ~]$ hadoop fs -cp /apple.txt /pear.txt
[hadoop@master ~]$ hadoop fs -ls /
Found 4 items
-rw-r--r--  1 hadoop supergroup     38 2017-01-16 00:50 /apple.txt
-rw-r--r--  1 hadoop supergroup     38 2017-01-16 01:02 /pear.txt
drwxr-xr-x  - hadoop supergroup     0 2017-01-16 00:43 /test
drwxr-xr-x  - hadoop supergroup     0 2017-01-15 23:02 /user
[hadoop@master ~]$ hadoop fs -cat /pear.txt
[hadoop@master ~]$ hadoop fs -rm /pear.txt
[hadoop@master ~]$ hadoop fs -rm -r /test

โ˜… ๊ทธ ์™ธ ๋ช…๋ น์–ด๋“ค

cat
chgrp
chmod
chown
copyFromLocal
copyToLocal
count
cp
du
dus
expunge
get
getmerge
ls
lsr
mkdir
moveFromLocal
moveToLocal
mv
put
rm
rmr
setrep
stat
tail
test
text
touchz

Node๋ผ๋Š”๊ฒƒ์€ โ€œ์„œ๋ฒ„ 1๋Œ€โ€๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค

NameNode - ๋„ค์ž„๋…ธ๋“œ

โ€‹ - DataNode์— ์‹ค์ œ ํŒŒ์ผ์˜ Meta์ •๋ณด๋“ค์„ ์ €์žฅํ•˜๋Š” ๊ณณ

โ€‹ - ์‹ค์ œ ํŒŒ์ผ์„ ์ €์žฅํ•˜์ง€ ์•Š์Œ

SecondaryNode - ์„ธ์ปจ๋„ˆ๋ฆฌ ๋…ธ๋“œ

โ€‹ - NameNode๊ฐ€ ์žฅ์•  ๋ฐœ์ƒํ•˜๋ฉด ์ฐจํ›„ ๋ณต๊ตฌ ์‹œ ํ™œ์šฉ๋˜๋Š” ์ •๋ณด

DataNode - ์‹ค์ œ Data๊ฐ€ ์ €์žฅํ•˜๋Š”๊ณต๊ฐ„ (html, avi, mp3, pdf, nosql ๋“ฑ๋“ฑ )

Hadoop ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ(HDFS)์—์„œ๋Š” 64MB๋กœ ์งœ๋ฅธ ๊ฒƒ์ด ๋ธ”๋ก(Block)์ž…๋‹ˆ๋‹ค.

hadoop_img_2

150MB์งœ๋ฆฌ 1.avi๋ผ๋Š” ๋™์˜์ƒ ํŒŒ์ผ์ด ์žˆ์Šต๋‹ˆ๋‹ค

1.avi ๋™์˜์ƒ์„ HDFS์•ˆ์— ์ €์žฅ ํ•  ๊ฒฝ์šฐ ์šฐ์„  150MB์งœ๋ฆฌ 1.avi๋ฅผ 64MB๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค.

150MB / 64MB = 2.3์ž…๋‹ˆ๋‹ค.

์ฆ‰ 150MB ์งœ๋ฆฌ ํŒŒ์ผ1๊ฐœ๋ฅผ 3๊ฐœ์˜ ํŒŒ์ผ๋กœ ๋งŒ๋“ค ๊ฒƒ์ด๋ฉฐ, 3๊ฐœ๋กœ ์ชผ๊ฒ๋‹ค(split)๋ผ๊ณ ๋„ ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ชผ๊ฐœ์ง„ ํŒŒ์ผ ํ•˜๋‚˜ ํ•˜๋‚˜๋ฅผ Block(๋ธ”๋ก) ๋˜๋Š” ์ฒญํฌ(chunk)๋ผ๊ณ ๋„ ๋งํ•ฉ๋‹ˆ๋‹ค. ์œ„์˜ ๊ทธ๋ฆผ์— 1-1, 1-2, 1-3์ด Block์ž…๋‹ˆ๋‹ค.

๋งŒ์•ฝ ์„œ๋ฒ„ 1๋Œ€์— 600MB๋ฅผ ์ฝ๊ธฐ/์“ฐ๊ธฐ ํ•  ๋•Œ์™€ 600MB๋ฅผ 10๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด ์„œ๋ฒ„10๋Œ€๊ฐ€ ์ฝ๊ณ /์“ฐ๊ณ  ํ•œ๋‹ค ๋ผ๊ณ  ํ•˜๋ฉด ์„œ๋ฒ„ 1๋Œ€๋‹น 64MB๋ฅผ ์ฝ๊ธฐ/์“ฐ๊ธฐ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค

DFS๋Š” ๋ณต์ œ ์ •์ฑ…์„ ์šด์˜์ž๊ฐ€ ์„ธ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

HDFS๋กœ ๊ตฌ์„ฑ๋œ Node์— ๋ธ”๋กํŒŒ์ผ์„ 2๊ฐœ์”ฉ ๋ณต์ œ, 3๊ฐœ์”ฉ ๋ณต์ œ, 4๊ฐœ์”ฉ ๋ณต์žฌ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ํŠน์ • ์„œ๋ฒ„์—์„œ ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ•ด๋„ ๋‹ค๋ฅธ ์„œ๋ฒ„์— ๋ธ”๋กํŒŒ์ผ์ด ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์„œ๋น„์Šค์— ์˜ํ–ฅ์ด ์—†์Šต๋‹ˆ๋‹ค.

์ด๊ฒƒ์„ No Single Point of Failure ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ๋ณธ์ ์œผ๋กœ HDFS๋Š” 2๊ฐ€์ง€ ํ˜•ํƒœ์˜ ์„œ๋ฒ„(Node)๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

NameNode์™€ DataNode์ž…๋‹ˆ๋‹ค.

NameNode๋Š” ํฌ๊ฒŒ 2๊ฐ€์ง€๋กœ ๋‚˜๋ˆ ์ง‘๋‹ˆ๋‹ค.

NameNode(Master๊ฐœ๋…)์™€ SecondaryNameNode๋กœ ๋‚˜๋ˆ ์ง‘๋‹ˆ๋‹ค.

NameNode์—๋Š” ์‹ค์ œ ํŒŒ์ผ์ด ์กด์žฌ ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค

NameNode(Master๊ฐœ๋…)๋Š” DataNode์— ์ €์žฅ๋˜์–ด์žˆ๋Š” ํŒŒ์ผ์˜ ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๊ฒƒ์„ Meta์ •๋ณด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์‹ค์ œ์ ์œผ๋กœ tree๊ตฌ์กฐ๋กœ ํŒŒ์ผ ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

hadoop_img_3

NameNode(Master๊ฐœ๋…)์€ ์‹ค์ œ ํŒŒ์ผ์„ ์ €์žฅํ•˜๊ณ  ์žˆ์ง€๋Š” ์•Š์ง€๋งŒ DataNode์— ์ €์žฅ๋œ ํŒŒ์ผ์˜ ์ •๋ณด๋ฅผ ์ €์žฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

โ€œ์ฑ…์—์„œ๋Š” ๋„ค์ž„๋…ธ๋“œ๋Š” ํŒŒ์ผ์‹œ์Šคํ…œ์˜ ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋ฅผ ๊ด€๋ฆฌํ•˜๋‹คโ€๋ผ๊ณ ๋„ํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ ์ €์žฅํ•˜๋Š” ๊ณต๊ฐ„์€ ๋ฉ”๋ชจ๋ฆฌ์˜์—ญ์— ์ €์žฅ ํ•˜๊ฒŒ ๋˜๋ฉฐ ๊ทธ ์ด์œ ๋Š” Client๊ฐ€ ํŒŒ์ผ์„ ์ฐพ์„๋•Œ๋Š” ์ฝ๊ณ , ์“ฐ๊ธฐ ํ• ๋•Œ ๋น ๋ฅธ DataNode๋ฅผ ์ฐพ์•„๊ฐ€๊ธฐ ์œ„ํ•ด์„œ์ž…๋‹ˆ๋‹ค.

์œ„์—์„œ ๋ฉ”๋ชจ๋ฆฌ ์˜์—ญ์— ์ €์žฅ๋˜์—ˆ๋‹ค๊ณ  ํ•˜์ง€๋งŒ ์‹ค์ œ๋กœ ๋ฉ”๋ชจ๋ฆฌ์˜ ๊ฒฝ์šฐ ์ปดํ“จํ„ฐ๊ฐ€ ๋ฆฌ๋ถ€ํŒ… ๋  ๊ฒฝ์šฐ ์‚ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์— ๋ฉ”๋ชจ๋ฆฌ ์˜์—ญ ์™ธ์— Local ๋””์Šคํฌ(ํ•˜๋“œ๋””์Šคํฌ)์—๋„ ๋ฉ”ํ„ฐ์ •๋ณด๋ฅผ ์œ ์ง€ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ ์ดMeta์ •๋ณด๋Š” ๋กœ์ปฌ๋””์Šคํฌ์— ๋„ค์ž„์ŠคํŽ˜์ด์Šค ์ด๋ฏธ์ง€(Namespace image)์™€ ์—๋””ํŠธ ๋กœ๊ทธ(edit log)ํ˜•ํƒœ๋กœ ์ง€์†์ ์œผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ๋„ค์ž„๋…ธ๋“œ๋Š” ์ฃผ์–ด์ง„ ํŒŒ์ผ์— ๋Œ€ํ•œ ๋ชจ๋“  ๋ธ”๋ก์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ ๋…ธ๋“œ์„ ์•Œ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค

NameNode๊ฐ€ DataNode์˜ ์‹ค์ œ Mata์ •๋ณด๋ฅผ ๊ธฐ์–ตํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ํŒŒ์ผ์„ ์ฐพ๊ธฐ๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ NameNode๊ฐ€ ์—๋Ÿฌ/์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ–ˆ์„ ๋•Œ๋Š” HDFS๋กœ ๊ตฌ์„ฑ๋œ ๋ชจ๋“  ํŒŒ์ผ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

DataNode์— ์‹ค์งˆ์ ์œผ๋กœ ํŒŒ์ผ์ด ์ €์žฅ๋˜์–ด ์žˆ๋‹ค๊ณ  ํ•˜๋”๋ผ๋„ ์ฐพ์„ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์—†์Šต๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ์ด๋Ÿฐ ์žฅ์• ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด Secondary Namenode๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

SecondaryNameNode๋Š” Master๊ฐœ๋…์˜ Namenode์‹œ ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ–ˆ์„ ๋•Œ ๋Œ€์ฒ˜์šฉ์ด ์•„๋‹™๋‹ˆ๋‹ค.

HDFS์˜ Secondary Namenode๋Š” ์œ„์™€ ๊ฐ™์€ ๊ธฐ๋Šฅ์„ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. NameNode๊ฐ€ ์žฅ์•  ์‹œ ๋‹ค์‹œ ์‹คํ–‰ํ–ˆ์„ ๋•Œ Secondary์— ์ €์žฅ๋œ ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ๋ณต๊ตฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์ œ๊ณตํ•˜๋Š” ์„œ๋ฒ„์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— NameNode(Master๊ฐœ๋…)์— namespace image์™€ edit log, ์ €๋„ ํŒŒ์ผ๋“ฑ ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ์ฃผ๊ธฐ์ ์œผ๋กœ ์‹คํ–‰ํ•˜์—ฌ ์ €์žฅ ํ•ฉ๋‹ˆ๋‹ค

DataNode๋Š” ์‹ค์ œ ํŒŒ์ผ์ด ์ €์žฅ๋˜๋Š” ๊ณต๊ฐ„์ž…๋‹ˆ๋‹ค.

๋™์˜์ƒ ํŒŒ์ผ์ด๋˜, txtํŒŒ์ผ, excelํŒŒ์ผ, pdf, Hbase์˜ DB์ •๋ณด, mp3 ํŒŒ์ผ ๋“ฑ๋“ฑ ๋ชจ๋“  ํŒŒ์ผ์„ ์ €์žฅํ•˜๊ฒŒ ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

DataNode์˜ Block๋‹จ์œ„๋กœ ์ €์žฅ๋˜๋ฉฐ ๊ธฐ๋ณธ Block๋Š” 64MB์ž…๋‹ˆ๋‹ค.

๋˜ํ•œ ์ด๋Ÿฐ ๋ธ”๋ก๋“ค์€ ๋ณต์ œ(4๋ฒˆ) ์ •์ฑ…์— ์˜ํ•ด ๋‹ค๋ฅธ DataNode๋กœ Copy๋ฅผ ํ•ฉ๋‹ˆ๋‹ค.

Block๋“ค์€ ๋‚ด๋ถ€ ์ •์ฑ…์— ์˜ํ•ด ์—ฌ๋Ÿฌ DataNode์— ์ €์žฅ์ด ๋˜๋Š”๋ฐ, ํŠน์ • Node(์„œ๋ฒ„)๊ฐ€ ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ–ˆ์„ ๋•Œ ๋‹ค๋ฅธ Node๋“ค์˜ Block์ •๋ณด๋ฅผ ์ฝ์œผ๋ฏ€๋กœ์„œ ์„œ๋น„์Šค์— ์ด์ƒ์ด ์—†์ด ๊ตฌ๋™์ด ๋ฉ๋‹ˆ๋‹ค.

์ด๊ฒƒ์„ No Single Point of Failure์ž…๋‹ˆ๋‹ค.

NameNode๋Š” ์žฅ์• ๋ฐœ์ƒ์‹œ ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•์œผ๋กœ ๋Œ€์ฒ˜๋ฐฉ๋ฒ•์„ ์ฐพ์•„์•ผ ํ•˜์ง€๋งŒ DataNode๋Š” ๊ทธ๋Ÿด ๊ฑฑ์ •์ด ์—†์Šต๋‹ˆ๋‹ค.

๊ธฐ๋ณธ์ ์œผ๋กœ DataNode์˜ ๋ชจ๋“  ์„œ๋ฒ„๋“ค์€ NameNode์™€ ํ†ต์‹ ์„ ํ•˜๋ฉฐ, Client์˜ ์š”์ฒญ์œผ๋กœ BlockํŒŒ์ผ๋“ค์„ ์ฝ๊ณ ๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.