Would change in scale factor effect sharding in MongoDB? -


the mongodb documentation says sharding should employed when 1 server/replica insufficient store data.

given dataset can scaled both 100gb & 1gb , executing same queries on both datasets, can -

sharding 100gb across 5 shards of 20gb each equivalent sharding 1gb across 5 shards of 200mb each. scale factor effect way sharding carried out mongo? if yes, changes observed?

given dataset can scaled both 100gb & 1gb , executing same queries on both datasets, can -

sharding 100gb across 5 shards of 20gb each equivalent sharding 1gb across 5 shards of 200mb each

from high-level view, sharded cluster architecture similar in both of examples: 5 shards, 3 config servers, , number of mongos processes. hesitate call "equivalent" in same way moped not equivalent motorcycle, although both two-wheeled vehicles in analogy interpretation depends on viewpoint.

however, it's possible start 5 shard cluster provisioned resources (ram/cpu/storage) meet expected workload , later upgrade (or downgrade) same cluster resources match changing requirements use case.

would scale factor effect way sharding carried out mongo? if yes, changes observed?

the main behaviour difference based on volume of sharded data sharded cluster balancing activity. balancing based on chunks, logical contiguous ranges of shard key values represent 64mb of data default.

balancing of chunks between shards triggered using migration threshold based on difference between shards least , chunks total number of chunks in sharded collection:

| number of chunks   | migration threshold | |====================|=====================| |   fewer 20    |   2                 | |   20-79            |   4                 | |   80 , greater   |   8                 | 

with 100mb of data per shard, 2 chunks (or ~10 overall).

with 20gb of data per shard, there @ least 312 chunks per shard (likely more, chunks preemptively split rather being full).

if choose shard key distributes data across shards, re-balancing should not required frequently. poor shard key, on other hand, require more frequent balancing , problems more evident @ scale because of i/o overhead.


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -