Why MongoDB db.col.count() shows more documents than was inserted -


using java driver mongodb trying insert 25,637,015 documents mongodb cluster. documents retrieved sql server database , inserted empty mongodb sharded collection (called col) in multithreaded fashion (8 concurrent threads). process took 2 hours. interesting , puzzling went on over 6(!) hours after program has finished.

firstly, hard drives in cluster node computers continued spin crazy. secondly, , more importantly, db.col.count() ran less second interval continued render different results:

mongos> db.col.count()  25694898 mongos> db.col.count() 25694917 mongos> db.col.count() 25695154 mongos> db.col.count() 25695207 mongos> db.col.count() 25695422 mongos> db.col.count() 25695493 mongos> db.col.count() 25696024 mongos> db.col.count() 25696130 mongos> db.col.count() 25698565 mongos> db.col.count() 25695145 

what more intriguing these counters while going , down greater number of inserted documents: 25,637,015. had been smaller speculate documents went sort of queue , being processes, greater?!

like said after 6 hours stabilized: hard drives stopped spinning , mongos> db.col.count() has rendered correct number: 25637015.

if of importance. have 2 replica sets in sharded cluster. each replica set has 2 data nodes , 1 arbiter node. run 3 config servers. , 3 mongos. spread between 4 centos boxes (virtual) running on windows hosts. source sql server on yet physical machine. balancer not disabled duration of insert or anytime after. mongodb version 2.2.6 64 bit.

any idea mongodb doing 6 hours after java program has finished inserting? why count high?

thank

for of drivers, mongodb uses memory enhance write performance. insertion first goes memory , journal, returns @ once. moment data not on disk yet. more information, have @ write concern section of mongodb manual. that's why collection keeps growing.

as count returns more accurate number issue, there's jira issue it. see if answers question. unfortunately it's not fixed yet.

edit:
time spent, it's hard sure. depends on hardware, disk. helpful run mongostat , mongotop , see what's going on. once know if insertion still running, you'll know if count result makes sense. here found related jira issue explaining count operation in sharded clusters. may lead situation. however, happens when the server migrating. before going further, please let me know how sharded cluster built. what's shard key?


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -