mongodb - HBase or Mongo for an Analytics DB if already using Hadoop? -
i have hadoop cluster store tons of logs on run pig scripts calculating aggregated analytics. have mongo cluster store production data.
i've been put in position need lot of one-off analytics queries, or enable others them. these queries need use both production data , log data together, whatever go with, i'd have in 1 place. log data in json , 10x size of prod data. here pros/cons of mongo , hbase i'm seeing:
mongo pros/ hbase cons:
- since log data in json, can mongo pretty easily, , can in real time comes in through fluentd.
- most people work have experience writing mongo queries needing work prod data, getting analytics db on mongo simple use.
- i know less hbase mongo.
- no idea how easy/difficult data in json or mongo hbase. imagine isn't bad, don't see documentation.
hbase pros/mongo cons:
- my log data bigger prod data, storing in both hadoop , mongo way more expensive storing prod data in both hadoop , mongo.
- i can build hbase on top of running hadoop cluster , fit prod data in there without adding many machines. if went mongo, i'd need whole new mongo cluster.
- i use phoenix on top of hbase allow simple sql syntax accessing our data, i'm not sure how unwieldily multi-level document-based data.
i know little hbase currently, , wouldn't consider myself mongo expert, i'm missing lot.
so, missing, , right situation?
first of all, should use can handle. therefore, mongo db seems choice, when data in json format.
on other hand, used hbase quite while , read performance amazing although having lot of rows , don't know if there , fast integration of mongo db hadoop. hbase hadoop database, predestinated work hadoop together.
if logs indexed (in hbase rowkey):
producing_program_identifier, timestamp, ...
hbase work quite query pattern. if decide on hbase, use phoenix framwork, save time using familiar interfaces jdbc , sql-like queries. provides simple aggregation functions (count, avg, max, min) may sufficient.
Comments
Post a Comment