java - Hadoop Reduce Output File Never Created for Large Data -

March 15, 2014

i'm writing application in java on hadoop 1.1.1 (ubuntu) compares strings in order find longest common substrings. i've got both map , reduce phases running small data sets. whenever increase size of input, reduce output never appears in target output directory. doesn't complain @ makes weirder. i'm running in eclipse , have 1 mapper , 1 reducer.

my reducer finds longest common substring in collection of strings , emits substring key , index of string contained value. i've got short example.

input data

0: alphaa  1: alphab  2: alzha

output emitted

key: alpha  value: 0  key: alpha  value: 1  key: al  value: 0  key: al  value: 1  key: al  value: 2

the first 2 input strings both share "alpha" common substring while 3 share "al". end indexing substrings , write them database when process complete.

an additional observation, can see intermediate files created in output directory, it's reduced data never put output file.

i've pasted hadoop output log below , claims has number of output records reducer, it's seem disappear. suggestions appreciated.

unable load native-hadoop library platform... using builtin-java classes applicable use genericoptionsparser parsing arguments. applications should implement tool     same. no job jar file set.  user classes may not found. see jobconf(class) or     jobconf#setjar(string). total input paths process : 1 running job: job_local_0001 setsid exited exit code 0  using resourcecalculatorplugin :     org.apache.hadoop.util.linuxresourcecalculatorplugin@411fd5a3 snappy native library not loaded io.sort.mb = 100 data buffer = 79691776/99614720 record buffer = 262144/327680  map 0% reduce 0% spilling map output: record full = true bufstart = 0; bufend = 22852573; bufvoid = 99614720 kvstart = 0; kvend = 262144; length = 327680 finished spill 0 starting flush of map output finished spill 1 merging 2 sorted segments down last merge-pass, 2 segments left of total size: 28981648 bytes  task:attempt_local_0001_m_000000_0 done. , in process of commiting  task attempt_local_0001_m_000000_0 done.  using resourcecalculatorplugin :     org.apache.hadoop.util.linuxresourcecalculatorplugin@3aff2f16  merging 1 sorted segments down last merge-pass, 1 segments left of total size: 28981646 bytes   map 100% reduce 0% reduce > reduce  map 100% reduce 66% reduce > reduce  map 100% reduce 67% reduce > reduce reduce > reduce  map 100% reduce 68% reduce > reduce reduce > reduce reduce > reduce  map 100% reduce 69% reduce > reduce reduce > reduce  map 100% reduce 70% reduce > reduce job_local_0001 job complete: job_local_0001 counters: 22   file output format counters      bytes written=14754916   filesystemcounters     file_bytes_read=61475617     hdfs_bytes_read=97361881     file_bytes_written=116018418     hdfs_bytes_written=116746326   file input format counters      bytes read=46366176   map-reduce framework     reduce input groups=27774     map output materialized bytes=28981650     combine output records=0     map input records=4629524     reduce shuffle bytes=0     physical memory (bytes) snapshot=0     reduce output records=832559     spilled records=651304     map output bytes=28289481     cpu time spent (ms)=0     total committed heap usage (bytes)=2578972672     virtual memory (bytes) snapshot=0     combine input records=0     map output records=325652     split_raw_bytes=136     reduce input records=27774 reduce > reduce reduce > reduce

i put reduce() , map() logic inside try-catch block catch block incrementing counter group "exception" , name exception message. gives me quick way (by looking @ counter list) see exceptions, if any, thrown.

Search This Blog

New Mian

java - Hadoop Reduce Output File Never Created for Large Data -

Comments

Post a Comment

Popular posts from this blog

android - java.net.UnknownHostException(Unable to resolve host “URL”: No address associated with hostname) -

jquery - How can I dynamically add a browser tab? -

keyboard - C++ GetAsyncKeyState alternative -