General Approach
I wanted to be able to run the script multiple times, so I decided to put my data directories under c:\temp, and to delete them at the start of the run. Similarly, I spawn all the "mongod" processes as windows, so they are easy to kill with a right click on the task bar. (I didn't want to kill all mongod processes, because I didn't want to touch the one running as a service, that supports my Development Sitecore instances.) Also, I used simple values for my port numbers: the mogod processes run as 30000 to 30008, the configuration servers as 40000, 400001, 400002, and the mongos (which functions as a router in a sharded environment) on port 50000. I also have added some diagnostics to check on statuses of various steps in the processed, rather than simply waiting for an arbitrary 60 seconds, as the original script does.
Clean Up, Set Up
The script begins by establishing a temporary directory, cleaning up an old copy and creating an output function, "report", to facilitate nicely formatted status reporting. The output is piped to Out-Null to keep the output stream clean.
$rootpath = "/temp/mongoshards/" new-module -scriptblock {function report($text) { write-output $("-" * $text.length) write-output $text write-output $("-" * $text.length) write-output "" }} | Out-Null report "Remove temporary directory" remove-item $rootpath -recurse report "Create data directories" new-item -type directory -path $rootpath | Out-Null report "Create mongod instances"
Creating the Mongod processes
The logic to create the mongod processes is pretty straight-forward:
report "Create mongod instances" $shards = 0..2 foreach ($shard in $shards) { $rss = 0..2 foreach ($rs in $rss) { $dbpath = "$rootpath/data/shard${shard}/r${rs}" new-item -type directory -path $dbpath | Out-Null # Start mongod processes $port = 30000 + ($shard * 3) + $rs $args = "--replSet s$shard --logpath $rootpath/s${shard}_r${rs}.log --dbpath $dbpath --port $port --oplogSize 64 --smallfiles" $process = start-process mongod.exe $args }
The only trickiness here is the variable substitution, leading to paths like "data/shard0/r1", and the logic to create the port numbers, 30000,30001,30002 for the shard 0 processes, 30003-30005 for s1, and 30006-30008 for s2. Of course, these are not yet replica sets; we handle that next.Creating the Replica Sets
This is done by creating a config document and passing it to rs.initialize.report "Configure replica sets" $port1 = 30000 + $shard * 3 $port2 = 30000 + $shard * 3 + 1 $port3 = 30000 + $shard * 3 + 2 $configBlock = "{_id: ""s$shard"", members: [ {_id:0, host:""localhost:$port1""}, {_id:1, host:""localhost:$port2""}, {_id:2, host:""localhost:$port3""}]}" echo "rs.initiate($configBlock)" | mongo --port $port1The echo "javascript" | mongo is a nice bit of syntax I picked up from the course, and simplifies passing MongoDB commands from a script. Since it takes a little while for a server to win an election and become a PRIMARY, we set up a one second loop to look for this event:
report "Check PRIMARY elected for each replica set" while ($True) { $response1 = (echo "rs.status()" | mongo -port 30000) $response2 = (echo "rs.status()" | mongo -port 30003) $response3 = (echo "rs.status()" | mongo -port 30006) if (($response1 -clike "*PRIMARY*") -and ($response2 -clike "*PRIMARY*") -and ($response3 -clike "*PRIMARY*")) { break } Start-Sleep -s 1 Write-Output "." } report "PRIMARY elected"Note that redirected output creates an array of strings, and the comparison operator -clike checks for a case sensitive match for any member of such an array.
Creating the Shards
Two steps are left to create the shards. First, we need to create the configuration servers that will store which records go where, and then we need to define each replica set as a shard. Finally, we need to specify the collection and key that will be used for sharding the datareport "Create config servers" $cfg_a = "${rootpath}/data/config_a" $cfg_b = "${rootpath}/data/config_b" $cfg_c = "${rootpath}/data/config_c" new-item -type directory -path $cfg_a new-item -type directory -path $cfg_b new-item -type directory -path $cfg_c $arg_a = "--dbpath $cfg_a --logpath ${rootpath}/cfg-a.log --configsvr --smallfiles --port 40000" $arg_b = "--dbpath $cfg_b --logpath ${rootpath}/cfg-b.log --configsvr --smallfiles --port 40001" $arg_c = "--dbpath $cfg_c --logpath ${rootpath}/cfg-c.log --configsvr --smallfiles --port 40002" start-process mongod $arg_a start-process mongod $arg_b start-process mongod $arg_c report "Config servers up"Two configuration servers stores the definitive version of what data resides where; the mongos instances keep this data in memory.
Once the configuration servers are set up, the next step is to add the shards. Note the step to make sure that port 50000 is on line. Basically, if the response does not contains a line with the word "failed", the server is treated as on-line.
report "Launch mongos" $args_s = "--port 50000 --logpath ${rootpath}/mongos-1.log --configdb localhost:40000,localhost:40001,localhost:40002" start-process mongos $args_s report "Check mongos online on port 50000" while($true) { $output = echo "" | mongo localhost:50000 2> null if (-not ($output -like "*failed*")) {break} Start-Sleep -s 1 Write-Output "." } report "Mongos avaiable at port 50000" report "Configure shards" echo "db.adminCommand( { addshard: ""s0/localhost:30000"" })" | mongo --quiet --port 50000 echo "db.adminCommand( { addshard: ""s1/localhost:30003"" })" | mongo --quiet --port 50000 echo "db.adminCommand( { addshard: ""s2/localhost:30006"" })" | mongo --quiet --port 50000 echo "db.adminCommand( { enableSharding:""school"" })" | mongo --port 50000 echo "db.adminCommand( { shardCollection:""school.students"", key:{student_id:1} })" | mongo --port 50000Loading some data
To get some data, I use a short Javascript (from MongoDB University) that pushes a list of students and course grades. Once this is done, I display the counts for the combined shared collection, and for each of the specific shards, and the output of sh.status(), which shows the breakpoints that MongoDB is using to distribute data.
report "Generate 100,000 documents" $mongoUniversityScript = "db=db.getSiblingDB(`"school`"); types = ['exam', 'quiz', 'homework', 'homework']; // 10,000 students for (i = 0; i < 10000; i++) { // take 10 classes for (class_counter = 0; class_counter < 10; class_counter ++) { scores = [] // and each class has 4 grades for (j = 0; j < 4; j++) { scores.push({'type':types[j],'score':Math.random()*100}); } // there are 500 different classes that they can take class_id = Math.floor(Math.random()*501); // get a class id between 0 and 500 record = {'student_id':i, 'scores':scores, 'class_id':class_id}; db.students.insert(record); } }" echo $mongoUniversityScript | mongo --port 50000 --quiet report "Total records, records in shard 1, 2, and 3" echo "db.students.count()" | mongo school --port 50000 echo "db.students.count()" | mongo school --port 30000 echo "db.students.count()" | mongo school --port 30003 echo "db.students.count()" | mongo school --port 30006 report "sh.status() output" echo "sh.status()" | mongo --port 50000Again, I have a link to the full script at the top of the page. I'm a rank beginner at PowerShell, so please feel free to make suggestions about how style and substance could be improved.
No comments:
Post a Comment