java - Hadoop Reduce Output File Never Created for Large Data -
i'm writing application in java on hadoop 1.1.1 (ubuntu) compares strings in order find longest common substrings. i've got both map , reduce phases running small data sets. whenever increase size of input, reduce output never appears in target output directory. doesn't complain @ makes weirder. i'm running in eclipse , have 1 mapper , 1 reducer.
my reducer finds longest common substring in collection of strings , emits substring key , index of string contained value. i've got short example.
input data
0: alphaa 1: alphab 2: alzha
output emitted
key: alpha value: 0 key: alpha value: 1 key: al value: 0 key: al value: 1 key: al value: 2
the first 2 input strings both share "alpha" common substring while 3 share "al". end indexing substrings , write them database when process complete.
an additional observation, can see intermediate files created in output directory, it's reduced data never put output file.
i've pasted hadoop output log below , claims has number of output records reducer, it's seem disappear. suggestions appreciated.
unable load native-hadoop library platform... using builtin-java classes applicable use genericoptionsparser parsing arguments. applications should implement tool same. no job jar file set. user classes may not found. see jobconf(class) or jobconf#setjar(string). total input paths process : 1 running job: job_local_0001 setsid exited exit code 0 using resourcecalculatorplugin : org.apache.hadoop.util.linuxresourcecalculatorplugin@411fd5a3 snappy native library not loaded io.sort.mb = 100 data buffer = 79691776/99614720 record buffer = 262144/327680 map 0% reduce 0% spilling map output: record full = true bufstart = 0; bufend = 22852573; bufvoid = 99614720 kvstart = 0; kvend = 262144; length = 327680 finished spill 0 starting flush of map output finished spill 1 merging 2 sorted segments down last merge-pass, 2 segments left of total size: 28981648 bytes task:attempt_local_0001_m_000000_0 done. , in process of commiting task attempt_local_0001_m_000000_0 done. using resourcecalculatorplugin : org.apache.hadoop.util.linuxresourcecalculatorplugin@3aff2f16 merging 1 sorted segments down last merge-pass, 1 segments left of total size: 28981646 bytes map 100% reduce 0% reduce > reduce map 100% reduce 66% reduce > reduce map 100% reduce 67% reduce > reduce reduce > reduce map 100% reduce 68% reduce > reduce reduce > reduce reduce > reduce map 100% reduce 69% reduce > reduce reduce > reduce map 100% reduce 70% reduce > reduce job_local_0001 job complete: job_local_0001 counters: 22 file output format counters bytes written=14754916 filesystemcounters file_bytes_read=61475617 hdfs_bytes_read=97361881 file_bytes_written=116018418 hdfs_bytes_written=116746326 file input format counters bytes read=46366176 map-reduce framework reduce input groups=27774 map output materialized bytes=28981650 combine output records=0 map input records=4629524 reduce shuffle bytes=0 physical memory (bytes) snapshot=0 reduce output records=832559 spilled records=651304 map output bytes=28289481 cpu time spent (ms)=0 total committed heap usage (bytes)=2578972672 virtual memory (bytes) snapshot=0 combine input records=0 map output records=325652 split_raw_bytes=136 reduce input records=27774 reduce > reduce reduce > reduce
i put reduce() , map() logic inside try-catch block catch block incrementing counter group "exception" , name exception message. gives me quick way (by looking @ counter list) see exceptions, if any, thrown.
Comments
Post a Comment