optimization - Handling optional/empty data in MongoDB -
i remember reading somewhere mongo engine more confortable when entire structure of document in place in case of update, here question.
when dealing "empty" data, example when inserting empty string, should default null
, ""
or not insert @ ?
{ _id: objectid("5192b6072fda974610000005"), description: "" }
or
{ _id: objectid("5192b6072fda974610000005"), description: null }
or
{ _id: objectid("5192b6072fda974610000005") }
you have remember description
field may or may not filled in every document (based on user input).
introduction
if document doesn't have value, db considers value null. suppose database following documents:
{ "_id" : objectid("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" } { "_id" : objectid("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null } { "_id" : objectid("5192d2441698aa96f0690d98"), "a" : 1 }
if create query find documents field desc different null, 1 document:
db.test.find({desc: {$ne: null}}) // output: { "_id" : objectid("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
the database doesn't differ documents without desc field , documents desc field value null. 1 more test:
db.test.find({desc: null}) // output: { "_id" : objectid("5192d2441698aa96f0690d98"), "a" : 1 } { "_id" : objectid("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
but differences ignored in queries, because, shown in last example above, fields still saved on disk , you'll receive documents same structure of documents sent mongodb.
question
when dealing "empty" data, example when inserting empty string, should default null, "" or not insert @ ?
there isn't difference {desc: null}
{}
, because of operators have same result. should pay special attention these 2 operators:
i'd save documents without desc field, because operators continue work expected , i'd save space.
padding factor
if documents in database grow frequently, mongodb might need move documents during update, because there isn't enough space in previous document place. prevent moving documents around, mongodb allocates space each document.
the ammount of space allocated mongodb per document controlled padding factor. cannot (and don't need to) choose padding factor, because mongodb adaptively learn it, can mongodb preallocating internal space each document filling possible future fields null values. difference small (depending on application) , might smaller after mongodb learn best padding factor.
sparse indexes
this section isn't important specific problem right now, may when face similar problems.
if create unique index on field desc, wouldn't able save more 1 document same value , in previous database, had more 1 document same value on field desc. let's try create unique index in previous presented database , see error get:
db.test.ensureindex({desc: 1}, {unique: true}) // output: { "err" : "e11000 duplicate key error index: test.test.$desc_1 dup key: { : null }", "code" : 11000, "n" : 0, "connectionid" : 3, "ok" : 1 }
if want able create unique index on field and let documents have field empty, should create sparse index. let's try create unique index again:
// no errors time: db.test.ensureindex({desc: 1}, {unique: true, sparse: true})
so far, good, why explaining this? because there obscure behaviour sparse indexes. in following query, expect have all documents sorted desc.
db.test.find().sort({desc: 1}) // output: { "_id" : objectid("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null } { "_id" : objectid("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
the result seems weird. happened missing document? let's try query without sorting it:
{ "_id" : objectid("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" } { "_id" : objectid("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null } { "_id" : objectid("5192d2441698aa96f0690d98"), "a" : 1 }
all documents returned time. what's happening? it's simple, not obvious. when sort result desc, use sparse index created , there no entries documents haven't desc field. following query show use of index sort result:
db.test.find().sort({desc: 1}).explain().cursor // output: "btreecursor desc_1"
we can skip index using hint:
db.test.find().sort({desc: 1}).hint({$natural: 1}) // output: { "_id" : objectid("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null } { "_id" : objectid("5192d2441698aa96f0690d98"), "a" : 1 } { "_id" : objectid("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
summary
- sparse unique indexes don't work if include
{desc: null}
- sparse unique indexes don't work if include
{desc: ""}
- sparse indexes might change result of query
Comments
Post a Comment