[目的]======================================================
在ubuntu16上安裝R和hadoop環境
並用rhdfs和rmr2執行簡單範例
[問題]======================================================
(問題1):在library(rmr2)時會出現錯誤訊息:
Please review your hadoop settings. See help(hadoop.settings)
(問題2):library(rhdfs)後
init.hdfs()出現錯誤訊息:
17/01/11 17:20:17 WARN util.NativeCodeLoader:
Unable to load native-hadoop
library for your platform...
using builtin-java classes where applicable
猜或許是hadoop streaming設置錯了??
[安裝過程]====================================================
啟動hadoop
cd ~/hadoop && sbin/start-all.sh
-----------------------------------------------------------------------------
裝r在master就好
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install r-base
java 設訂-----------------------------------------------------------
echo $JAVA_HOME
sudo JAVA_HOME=/usr/lib/jvm/jdk/ R CMD javareconf
in the R-------------------------------------------------------------
進入R
sudo R
install.packages(c('codetools','R','Rcpp','RJSONIO','bitops','digest','functional','stringr','plyr','reshape2','rJava','caTools'))
下載rmr(用於mapreduce 和 rhabse)
wget --no-check-certificate
https://raw.github.com/RevolutionAnalytics/rmr2/3.3.0/build/rmr2_3.3.0.tar.gz
wget --no-check-certificate
https://raw.github.com/RevolutionAnalytics/rhdfs/master/build/rhdfs_1.0.8.tar.gz
在R中
---------------------------------------------------------------------------------------
$sudo R
install.packages('/home/hduser/rhdfs_1.0.8.tar.gz', repos=NULL, type='source')
install.packages('/home/hduser/rmr2_3.3.0.tar.gz', repos = NULL,
type='source')
安裝影片如下:
https://www.youtube.com/watch?v=w70h_u8qoHM&t=680s[路徑設置/網路資料]=====================================================
參考一些網路資料
都無法決解這問題
發現很多討論都跟HADOOP_STREAMING路徑設置有關@@
@資料一
http://stackoverflow.com/questions/29682432/r-mapreduce-library-rmr2-shows-a-warning-message-when-loaded
這篇提到要在R中重設Sys.setenv的路徑跟我完全不一樣
感覺也不是我的問題
@資料二
https://github.com/RevolutionAnalytics/RHadoop/issues/122
這篇還沒開始看,英文好吃力
@資料三
https://github.com/RevolutionAnalytics/rmr2/issues/155
這篇跟我的問題非常像,但我還是看不太懂,而他設的路徑也跟我不一樣@@
>small.ints = to.dfs(1:10)
>mapr = mapreduce(input = small.ints,
map = function(k,v) cbind(v,v^2))
會有
Streaming Command Failed!
Error in ...
hadoop streaming failed with error code 5
不知是什麼意思>'<
以下是我設的路徑:
Sys.setenv(HADOOP_HOME='/home/hduser/hadoop')
Sys.setenv(HADOOP_PREFIX='/home/hduser/hadoop')
Sys.setenv(HADOOP_CMD='/home/hduser/hadoop/bin/hadoop')
Sys.setenv(HADOOP_STREAMING='/home/hduser/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar')
不知是哪錯的...
在hadoop/logs也看不太懂錯誤是什麼QAQ
希望大大們幫看一下我HADOOP_STREAMING設置是否有錯?或怎麼看錯誤
或是哪出錯了>'<
--