The first step was to extract the IDs from the serialized *.item files. In the powershell prompt, I navigated to the root of the TDS directory, and used this command to get at the item IDs:
>Get-ChildItem -recurse | Select-String '^id:' | More
This looked like it was giving me the desired raw output:
But I wasn't 100% sure what was what. For example, what was the "3" doing before the ":id:"? To get a closer look at the data returned, I piped this into Format-Table:
>Get-ChildItem -recurse | Select-String '^id:' | Format-Table | More
This shows what fields I have to work with. So to identify my duplicates, I needed to group by the "Line" field, which contained the Sitecore ID, and find those with a count greater than 1. A quick Google search turned up an article on how to do Group operations in PowerShell, and the previous output showed needed to group by the "Line" field. So now I had this:
>Get-ChildItem -recurse | Select-String '^id:' | Group Line | More
Following the example of the article I cited above, I used this to identify the duplicates:
Get-ChildItem -recurse | Select-String '^id:' | Group Line | Sort Count -Descending | Select -First 5
In my case I saw several IDs with a count of two, so this command gave me the information I needed. A more universal approach would be to filter the results to counts of two or above, which you can do with this:
Get-ChildItem -recurse | Select-String '^id:' | Group Line | Where {$_.Count -gt 1}
Or with Powershell 3.0 and up, you can get rid of the curly braces:
Get-ChildItem -recurse | Select-String '^id:' | Group Line | Where Count -gt 1
Finally, I should mention that most of the commands above have shorter versions, which speed typing at the cost of legibility:
Get-ChildItem | gci | |
Select-String | sls | |
Where | ? |
So the search could have been written as below:
gci -recurse | sls '^id:' | group line | ? count -gt 1
Finally, in addition to Format-Table, Format-List (which writes out each property of each object returned) and Format-Wide (which writes out a single property of each object, in a multi-column format) are useful as you do discovery of how your query is working. Finally, Out-GridView sends results to a window that allows sorting, filtering, and selecting columns.
These articles were helpful as I figured out how to query with Powershell:
And to learn about FakeDB's Serialization feature:
No comments:
Post a Comment