I usually script most of the configuration of a SharePoint 2010 farm using PowerShell, be it for a lab, development, QA, testing, or production. I say most, because there are components I haven't included in my scripts either because the functionality isn't there, or because I can't find an appropriate set of PowerShell cmdlets. Something I hadn't looked into was setting the search crawl schedules because it's simple enough to configure from Central Administrator. I finally got tired of setting up the schedules manually (or more likely forgetting to set up the schedules) so I dug in to figure it out.
Schedules
Crawl schedules are a group of properties of a content source, which tell the search service application when the content source should be crawled. There are two types of schedules: full and incremental. A full crawl runs through the entire content source adding everything it finds to the index. An incremental crawl will compare the content source to the index to find changes and record these appropriately. Depending on the size of a content source, a full crawl can take hours and be very resource intensive, while an incremental crawl will take less time (though it could be just as resource intensive.) How often you run these depends on your infrastructure, your search requirements, and how often the content changes.
Frequency
You can schedule the crawl by three main levels of granularity: monthly, weekly, or daily. Functionally, this is similar to Outlook's monthly, weekly, and daily recurring appointments.
- Monthly schedules run once during a specified month. You can configure which months the crawl runs, which day in the month, and at which time to start. Once run, it can be repeated again throughout the day.
- For weekly schedules, the crawl runs every nth week, where you specify a value for n, and specify the days of the week to run and the start time. A weekly crawl schedule can be repeated throughout the day as well.
- The daily crawls run every nth day, starting a specified time and can repeat throughout the day.
Example
To see how these frequencies look, let's consider the following highly-realistic example:
Contoso Corporation is a government contractor specializing in the development of specialized equipment for a large government agency. Contoso has an intranet farm (CONTRANET) it uses for employees to collaborate on designs, build a knowledge base for their product development, and store its financial records. Contoso believes in being open with high-level financial information and provides this data to employees in spreadsheets in a site owned by the accounting department. Some employees have the financial data site bookmarked in their browser, but most find the latest or historical data with search. Management wants employees to be able to find the financial information immediately as of the fifteenth day of every month (it's usually posted sometime between the 1st and 14th depending on how much money Contoso made in the previous month). The Business Continuity Team is responsible for managing Contoso's backup infrastructure and wants the search index to be as up to date as possible before running their weekly full backups of the SharePoint farm so they can reduce the effort in performing a restore and ensure they meet their regulatory-compliant SLA should disaster strike. Additionally, employees do not want stale search results — if someone uploaded a new document or updated a wiki page within CONTRANET more than an hour ago it should appear in the search results.
Hank is one of CONTRANET's SharePoint administrators and is also a web developer who wears ironic CSS-themed t-shirts. Hank's been tasked to set up the crawl schedules for CONTRANET. After reviewing the business requirements, he decides he needs three schedules:
- Monthly full crawl on the 15th of the month just after midnight to ensure all the financial information is present (the full crawl in CONTRANET only runs for 30 minutes despite the terabytes of data because Contoso invests in only top of the line storage hardware)
- Weekly full crawl every Friday night at 10:30 PM to ensure the business continuity team's weekly backups contain an up-to date index
- Daily incremental crawl every 30 minutes throughout the day to ensure all changes are in the index as soon as possible so the employees don't have to keep refreshing the search results page
With a bit of tweaking, we can turn Hank's findings into statements that look more like SharePoint crawl schedules:
- The full monthly crawl runs on the 15th day of every month starting at 12:01 AM and does not repeat within the day.
- The full weekly crawl runs every 1 week on Friday at 10:30 PM and does not repeat within the day.
- The incremental daily crawl should be run every 1 day starting at 12:00 AM and repeating every 30 minutes for 1440 minutes.
We have three schedules — two full schedules and one incremental schedule. But this is a problem: a content source can only have one full and one incremental schedule. In order to make this work, Hank needs to create a second content source. He'll create a content source for the accounting site (out of scope for this discussion) and give it the full monthly schedule. (More on this below.)
(Remember how I said this was a highly-realistic example scenario? Obviously the schedules we came up with are ridiculous, but this way I can show you how to configure both a full and incremental schedule, and the daily, weekly, and monthly schedules. You probably won't use a strategy like Hank's in your environment; I'm just being thorough for demonstrative purposes.)
Read The Fine Manual
Before we look at implementing Hank's schedules, let's take a moment to check the documentation.
To add a crawl schedule with PowerShell, you use the Set-SPEnterpriseSearchContentSource cmdlet. If you clicked that link as of the time of this post being published, you may (or may not) be surprised that there is a lot of content there but it doesn't really explain how to use the cmdlet or how to generate the appropriate schedule. They list five ways to run the command but there is no attempt to explain the purpose of the different ways. If you scroll all the way to the bottom of the page you will find an example with a clue as to how the cmdlet works.
Parameters
First, there is the ScheduleType parameter. ScheduleType lets us pick whether the schedule is full or incremental. Easy.
Next, to specify a monthly, weekly, or daily schedule, you use the appropriate MonthlyCrawlSchedule, WeeklyCrawlSchedule, or DailyCrawlSchedule parameter. Simple.
Now it gets complicated. Well, not so much complicated as it is involved.
The CrawlScheduleRunEveryInterval parameter is used for daily and weekly schedules. It's the "Run every # days" and "Run every # weeks" field. It is not used for Monthly schedules.
The CrawlScheduleDaysOfWeek parameter specifies the days of the week for a weekly schedule. Enter the days as a comma separated string of day names. For example, every day of the week is: "Sunday,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday" (and you would include the quotes because it's a string).
The CrawlScheduleDaysOfMonth parameter specifies the days of the month (1-31) for a monthly schedule. For multiple days, provide a comma separated string: "1,8,15,22,29"
The CrawlScheduleMonthsOfYear parameter specified the months of the year, enter as comma separated string of month names. The entire year is "January,February,March,April,May,June,July,August,September,October,November,December".
The CrawlScheduleStartDateTime parameter specifies the starting time. Enter this in either 12- or 24-hour formats ("1:00 PM" or "13:00"). If you don't include this, it will default to 12:00 AM (00:00). It's worth noting that in Central Administrator this field is a drop down menu with 24 options – one for every hour in the day. If you desire to start your jobs at a time that isn't the top of the hour, you will need to use an alternative method such as PowerShell.
The CrawlScheduleRepeatInterval parameter enables the "Repeat within the day" option and sets the "every" number of minutes. For example if you want to repeat, specify the interval. If you don't, do not include this parameter.
The CrawlScheduleDuration parameter is the "for" value when repeating within the day (Combined with CrawlSchedulRepeatInterval, you get: Repeat within the day every [CrawlScheduleRepeatInterval] for [CrawlScheduleDuration])
Plugging all that together we get the following three cmdlets for creating Hank's three schedules:
- Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Full -MonthlyCrawlSchedule -CrawlScheduleDaysOfMonth 15 -CrawlScheduleMonthsOfYear "January,February,March,April,May,June,July,August,September,October,November,December" -CrawlScheduleStartDateTime 00:01 -Confirm:$false
- Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Full -WeeklyCrawlSchedule -CrawlScheduleRunEveryInterval 1 -CrawlScheduleDaysOfWeek "Friday" -CrawlScheduleStartDateTime "10:30 PM"
- Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Incremental -DailyCrawlSchedule -CrawlScheduleRunEveryInterval 1 -CrawlScheduleRepeatInterval 30 -CrawlScheduleRepeatDuration 1440 -Confirm:$false
The -Confirm:$false parameter and value are to suppress the confirmation prompt from Set-SPEnterpriseSearchCrawlContentSource. You would want this if you are automating this configuration.
The dirty part
First load up an elevated PowerShell window and get the Search Service Application instance:
Windows PowerShell Copyright (C) 2009 Microsoft Corporation. All rights reserved. PS > Add-PSSnapin Microsoft.SharePoint.PowerShell PS > Get-SPServiceApplication DisplayName TypeName Id ----------- -------- -- State Service App... State Service e82c77a1-7074-4b68-9cf2-cc24b7449b53 Managed Metadata ... Managed Metadata ... b9866b7a-d33e-4b7b-a833-dd5ab4dd9657 Web Analytics Ser... Web Analytics Ser... 635d3e34-98b8-486f-b436-bf379a6d8f0d Security Token Se... Security Token Se... 7af69ee2-a541-4fc5-930f-845464a6100a Application Disco... Application Disco... 6e5cceac-394c-4f4c-8a1f-8cfb0f0a9c31 Usage Service App... Usage and Health ... 339834b1-3fbb-439e-9806-88c12361d449 Search Administra... Search Administra... 76743784-caea-42fe-90cc-acf0d49a212e User Profile Serv... User Profile Serv... 606dcb57-9b79-4ac5-8ddf-1d71ea0b7804 Search Service Ap... Search Service Ap... ba30ed2e-e74a-470e-9e4b-1842ab472519 PS > $searchapp = Get-SPServiceApplication ba30ed2e-e74a-470e-9e4b-1842ab472519
Hank has already gone ahead and created a new content source for the accounting site and there's already the content source for CONTRANET. When he enumerates the content sources in the search application we'll see an array:
PS > $content_sources = Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $searchapp PS > $content_sources Name Id Type CrawlState CrawlCompleted ---- -- ---- ---------- -------------- Local SharePo... 2 SharePoint Idle CONTRANET 7 SharePoint Idle Accounting Site 8 SharePoint Idle PS > $content_sources[1] Name Id Type CrawlState CrawlCompleted ---- -- ---- ---------- -------------- CONTRANET 7 SharePoint Idle PS > $content_sources[2] Name Id Type CrawlState CrawlCompleted ---- -- ---- ---------- -------------- Accounting Site 8 SharePoint Idle
To reduce some confusion, let's assign a new variable to both CONTRANET and Accounting Site content sources:
PS > $content_contranet = $content_sources[1] PS > $content_contranet Name Id Type CrawlState CrawlCompleted ---- -- ---- ---------- -------------- CONTRANET 7 SharePoint Idle PS > $content_accounting = $content_sources[2] PS > $content_accounting Name Id Type CrawlState CrawlCompleted ---- -- ---- ---------- -------------- Accounting Site 8 SharePoint Idle
Now that we have our content sources, we can set the schedules. The CONTRANET source has a full and incremental schedule while the Accounting Site source has a full schedule:
PS > $content_contranet | Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Full -WeeklyCrawlSchedule -CrawlScheduleRunEveryInterval 1 -CrawlScheduleDaysOfWeek "Friday" -CrawlScheduleStartDateTime "10:30 PM" PS > $content_contranet | Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Incremental -DailyCrawlSchedule -CrawlScheduleRunEveryInterval 1 -CrawlScheduleRepeatInterval 30 -CrawlScheduleRepeatDuration 1440 -Confirm:$false PS > $content_accounting | Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Full -MonthlyCrawlSchedule -CrawlScheduleDaysOfMonth 15 -CrawlScheduleMonthsOfYear "January,February,March,April,May,June,July,August,September,October,November,December" -CrawlScheduleStartDateTime 00:01 -Confirm:$false
We can now check out the schedules:
PS > $content_contranet.FullCrawlSchedule WeeksInterval : 1 DaysOfWeek : Friday BeginDay : 12 BeginMonth : 8 BeginYear : 2011 StartHour : 22 StartMinute : 30 RepeatDuration : 0 RepeatInterval : 0 Description : At 10:30 PM every Fri of every week, starting 8/12/2011 NextRunTime : 8/12/2011 10:30:00 PM PS > $content_contranet.IncrementalCrawlSchedule DaysInterval : 1 BeginDay : 12 BeginMonth : 8 BeginYear : 2011 StartHour : 0 StartMinute : 0 RepeatDuration : 1440 RepeatInterval : 30 Description : Every 30 minute(s) from 12:00 AM for 24 hour(s) every day, starting 8/12/2011 NextRunTime : 8/12/2011 2:00:00 PM PS > $content_accounting.fullcrawlschedule DaysOfMonth : Day15 MonthsOfYear : AllMonths BeginDay : 12 BeginMonth : 8 BeginYear : 2011 StartHour : 0 StartMinute : 1 RepeatDuration : 0 RepeatInterval : 0 Description : At 12:01 AM on day 15 of every month, starting 8/12/2011 NextRunTime : 8/15/2011 12:01:00 AM PS > $content_accounting.IncrementalCrawlSchedule
(There was nothing returned for the accounting incremental schedule since we did not create an incremental schedule.)
Since the schedules look good, let's kick off a full crawl for good measure:
PS > $content_contranet.StartFullCrawl() PS > $content_accounting.StartFullCrawl()
And that's it.
References
- Set-SPEnterpriseSearchCrawlContentSource PowerShell cmdlet reference for setting the properties of a content source