Santa finally showed up on Thursday, when MVPs got to attend a presentation from project lead Tim Ward and got a link to download the software. Here are the highlights:
For Developers, A Big Release
Tim Ward stressed that this was a very big release, and one that targeted developers:- Sitecore has been rewritten to leverage the .Net 4.5 API.
- Third Party libraries (Lucene, HTML Agility Pack, NewtonSoft) have been updated to the latest versions.
- Datasources now support Guids, so that renderings don’t break when items are moved. This is wired in to the LinkManager, so you will get a warning if you delete an item referenced in a datasource field. Datasources can also be built on search queries.
- Developers now have a pipeline available to manipulate construction of the content tree, allowing suppression of items, surrogates, and items from external data sources.
- The new functionality has been broken out into separate DLLs
- A new implementation of the Item class. There were not a lot of details on this on the call.
Search, Refactored
At the heat of the release is a refactoring of indexing and search:- Search providers are now a pluggable component, with support for enterprise-grade Solr in addition to Lucene, and support for RESTful-based ElasticSearch is on the way.
- It’s worth noting that Solr is built upon Lucene, but not Lucene.Net. When you run Solr, you are running a Java application. Solr can be housed on the same server as your Sitecore installation, on a separate machine, or in the cloud.
- Tim Ward stressed that while Lucene is an acceptable option for normal (fewer than 100 million documents) implementations, Solr offers compelling scalability features (such as support for multiple servers and auto sharding) and can handle billions of documents.
- All configuration settings for Lucene are now exposed through Sitecore configuration. This gives the developer a number of new knobs to turn in optimizing search performance.
- A rich and extendible Crawler API is available (includes an abstract class and an ICrawler interface), allowing customization to deal with a number of data types.
- LinqToSitecore has been updated from LinqToBuckets by providing IQueryable interfaces to Luene and Solr. This is significant because IEnumerable simply provides an enumeration of items, forcing you to do filter logic in memory, which is clearly not scalable if you are dealing with large number of objects. An IQueryable implementation builds a query that is executed against an index. For instance, suppose you wanted to write a query for all product items that have a price under $10, and let’s say you had a very large number of products. An IEnumerable query would force you to iterate over all of them to select the qualifying items, whereas IQueryable pushes this work to the index. As a side note, Tim Ward described implementing IQueryable as one of the most difficult technical problems he has ever encountered.
- Enabling this Linq syntax is Sitecore Hibernate, an object modeler that looks a lot like Glass, but is based on index values rather than the Sitecore GetItem API. This should make it very fast, but does require you to define in configuration the fields you want the index to keep. Indexes break up fields into tokens, and do not keep the original values unless you tell them to, because this has a disk cost.
- You now get reindexing statistics from the Sitecore Desktop. Take a look:
- You can reindex a small part of a content tree. (This would have been EXTREMELY useful on my last project, where rebuilding an index took upwards of 8 hours).
- You can configure an index to auto-swap, so that a new one is built in a temporary location so that a rebuild does not impact site functionality.
- HTML caching can now be triggered by an index rebuild (“Clear on Index Update”). This is helpful if your rendering uses index values, in the following scenario: You publish a change, the HTML cache is purged by the publish, but the index has not yet been updated. Bam, you are stuck with a stale value in cache until the next publish. This setting prevents that.
- Item Buckets are enabled out of the box. These are items that can store large numbers of children (e.g. tens of millions) while hiding presentation from the user. Instead of navigating a tree, the author is presented with a search interface that allows bulk querying and updating operations, and pages results
- Search indexing and retrieval performance is vastly improved. Tim Ward cited that rebuilding an index with 10 million items on Sitecore 6.6 took 27 hours, and this was reduced to 98 minutes on the same equipment with Sitecore 7. Simple retrieval searches time was slashed from 3.9 seconds to 0.3 seconds, and complex searches (sort, date, and wildcard) that produce .Net exceptions on 6.6 now work and take the same 0.3 seconds as simple searches.
- A verbose logging option for troubleshooting Lucene issues. Here’s a sample of the output it produces. This is very detailed, with raw and optimized views of each search. This option produces a fire-hose of data, for developer workstation debugging, not production servers:
- Indexing of the media library now supports the IFilter interface, an open standard for documents types such as .DOC, .DOCX and .PDF to present text for indexing. This means that word and PDF documents stored in the Media Library can be found by their contents. (Full disclosure: I have not been able to get this to work in my own testing so far.)
Search Presentation, Refactored
- The very clean search interface introduced with the Item Bucket module is the most significant change for the user. This has been extended into a number of contexts, such as the Page Editor and the Insert Link and Insert Media dialog boxes. This is where the full power of this feature comes into focus. An author will be able to pull in links and images from repositories of millions of items, using fine-grained, faceted search.
- When a search result is returned a user can apply bulk operations, such as publishing, deleting, changing personalization, and cloning:
- Finally, there are a few developer goodies: A FillDB screen that can create test items from a text file (say a public domain book from Project Gutenburg), and a LinqScratchPad for exploring the index.
There’s no word on when this preview will be released to the wider Sitecore community. In the meantime, you can read John West’s blogs posts or download the shared source version of Item Buckets. Hopefully Sitecore Santa will come soon!
Thanks for the preview, Dan! I'm looking forward to playing with this when Sitecore releases it.
ReplyDelete-Craig
Bookmarked with gusto
ReplyDeleteGreat article. Very useful. Thanks.
ReplyDeleteI can confirm that indexing of media library files work after installing the following IFilter packs. I did this for Solr, but am sure will work for Lucene as well.
Office: http://www.microsoft.com/en-us/download/details.aspx?id=17062
Adobe: http://www.adobe.com/support/downloads/thankyou.jsp?ftpID=5542&fileID=5550
After installing the IFilters I needed to restart the website and reindex the content.
Thanks Dan, here is a blog post my colleague wrote to continue on ElasticSearch:
ReplyDeletehttp://blog.navigationarts.com/an-alternative-search-solution-elasticsearch/