mongodb - HBase or Mongo for an Analytics DB if already using Hadoop? -


i have hadoop cluster store tons of logs on run pig scripts calculating aggregated analytics. have mongo cluster store production data.

i've been put in position need lot of one-off analytics queries, or enable others them. these queries need use both production data , log data together, whatever go with, i'd have in 1 place. log data in json , 10x size of prod data. here pros/cons of mongo , hbase i'm seeing:

mongo pros/ hbase cons:

  1. since log data in json, can mongo pretty easily, , can in real time comes in through fluentd.
  2. most people work have experience writing mongo queries needing work prod data, getting analytics db on mongo simple use.
  3. i know less hbase mongo.
  4. no idea how easy/difficult data in json or mongo hbase. imagine isn't bad, don't see documentation.

hbase pros/mongo cons:

  1. my log data bigger prod data, storing in both hadoop , mongo way more expensive storing prod data in both hadoop , mongo.
  2. i can build hbase on top of running hadoop cluster , fit prod data in there without adding many machines. if went mongo, i'd need whole new mongo cluster.
  3. i use phoenix on top of hbase allow simple sql syntax accessing our data, i'm not sure how unwieldily multi-level document-based data.

i know little hbase currently, , wouldn't consider myself mongo expert, i'm missing lot.

so, missing, , right situation?

first of all, should use can handle. therefore, mongo db seems choice, when data in json format.

on other hand, used hbase quite while , read performance amazing although having lot of rows , don't know if there , fast integration of mongo db hadoop. hbase hadoop database, predestinated work hadoop together.

if logs indexed (in hbase rowkey):

producing_program_identifier, timestamp, ... 

hbase work quite query pattern. if decide on hbase, use phoenix framwork, save time using familiar interfaces jdbc , sql-like queries. provides simple aggregation functions (count, avg, max, min) may sufficient.


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

node.js - Getting the socket id,user id pair of a logged in user(s) -

keyboard - C++ GetAsyncKeyState alternative -