Google Search Appliance Configuring GSA Unification Google Search Appliance software version 7.2

Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com GSA-UNI_100.02 December 2013 © Copyright 2013 Google, Inc. All rights reserved. Google and the Google logo are, registered trademarks or service marks of Google, Inc. All other trademarks are the property of their respective owners. Use of any Google solution is governed by the license agreement included in your original contract. Any intellectual property rights relating to the Google services are and shall remain the exclusive property of Google, Inc. and/or its subsidiaries (“Google”). You may not attempt to decipher, decompile, or develop source code for any Google product or service offering, or knowingly allow others to do so. Google documentation may not be sold, resold, licensed or sublicensed and may not be transferred without the prior written consent of Google. Your right to copy this manual is limited by copyright law. Making copies, adaptations, or compilation works, without prior written authorization of Google. is prohibited by law and constitutes a punishable violation of the law. No part of this manual may be reproduced in whole or in part without the express written consent of Google. Copyright © by Google, Inc.

Google Search Appliance: Configuring GSA Unification

2

Contents

Configuring GSA Unification ......................................................................................... 4 Introduction to GSA Unification Limitations How Search and Serve Work Determining Search Appliance Roles Using Collections to Direct User Searches How Crawling and Indexing Work How OneBox Modules Work About Security About Authentication and Authorization within a Unified Environment Using Authorization on the Primary Search Appliance Using Delegated Authorization About Composite Collections About Front Ends About Crawl Patterns About Database Crawling About Timeout Intervals and Result Biasing Unified Environment Checklist Setting up Unified Environments Adding or Deleting Nodes Updating a GSA Unification Configuration Setting up Mirroring in a Unified Environment Troubleshooting Using the GSA Unification Network Stats and GSA Unification Diagnostic Pages to Find Problems Users See 404 Errors After Clicking Results Results from Secondary Search Appliances are Not Available on Primary Search Appliance Unexpected Authorization Behavior

Google Search Appliance: Configuring GSA Unification

4 5 5 6 7 8 8 8 9 10 12 16 17 18 18 19 19 21 21 21 22 23 23 24 24 24

3

Configuring GSA Unification

This guide contains the information you need to configure GSA unification. GSA unification, also called a unified environment, is a Google Search Appliance feature in which a group of search appliances is configured so that a body of documents spread out over several search appliances can be searched by a single search query. This document is for you if you are a search appliance administrator, network administrator, or another person who configures search appliances or networks. You need to be familiar with configuring crawl, serve, front ends, and security on the Google Search Appliance.

Introduction to GSA Unification GSA unification, also called a unified environment, is a Google Search Appliance feature in which a group of search appliances is configured so that a body of documents spread out over several search appliances can be searched by a single search query. The search appliances in the configuration each crawl a different set, or corpus, of documents. Each search appliance is set up with its own collections, front ends, and other administrator-configurable features. Configure a unified environment when you need to provide search and index services for a larger corpus of documents than a single Google Search Appliance can accommodate. For example, if you need to index 40 million documents, you might use four instances of the Google Search Appliance model GB-7007, with each search appliance licensed for 10 million documents. Any model of the Google Search Appliance running software version 6.0 or later can be configured to participate in a unified environment. The configuration may include different search appliance models, provided they are all running the same software version. Use a unified environment with two or more search appliances. If you have an existing unified environment, you can add more search appliances to increase the number of searchable documents or to locate search appliances in different geographic regions. For example, you might have search appliances in Tokyo and Beijing that use a unified environment and that index different sets of documents. If you install a search appliance in the Sydney office to index a different body of documents that you want available to Tokyo and Beijing users, you can add the Sydney search appliance to the unified environment.

Google Search Appliance: Configuring GSA Unification

4

One search appliance in the configuration is designated the primary search appliance or primary node. The other search appliances are designated the secondary search appliances or secondary nodes. Unified environments are typically set up so that end users’ search queries are directed to the primary search appliance. The primary search appliance searches its own index and issues a query to the indexes on the secondary search appliances. The secondary nodes return their results to the primary search appliance. The primary search appliance aggregates the search results from itself and the secondary search appliances, then serves the results to the user. The user does not need to repeat the search on each search appliance in the configuration.

Limitations For information about GSA unification limitations, see Specifications and Usage Limits.

How Search and Serve Work In a unified environment, the search and serve processes work seamlessly from the end users’ standpoint. Users submit queries and receive results on the same familiar Google Search Appliance search pages. You control which documents are searched by configuring collections and composite collections within the unified environment. For more information, see “Using Collections to Direct User Searches” on page 7. The following graphic shows three search appliances in a unified environment: •

Search Appliance A indexes sales and marketing documents. It is the primary search appliance in the configuration.



Search Appliance B indexes technical support documents. It is a secondary search appliance.

Google Search Appliance: Configuring GSA Unification

5



Search Appliance C indexes accounting documents. It is a secondary search appliance.

Here’s what happens when a user wants to search for technical support, sales, and accounting information about a particular customer, Buzzword Advertising. 1.

The user browses to the search page for Search Appliance A, the primary search appliance in the configuration, and types Buzzword Advertising in the search box.

2.

Search Appliance A searches its local index, which contains sales and marketing documents.

3.

Search Appliance A issues a query to Search Appliance B and Search Appliance C.

4.

Search Appliance B searches its local index, which contains support information.

5.

Search Appliance B sends the results back to Search Appliance A.

6.

Search Appliance C searches its local index, which contains accounting information.

7.

Search Appliance C sends the results back to Search Appliance A.

8.

Search Appliance A merges its own results with the results from Search Appliance B and Search Appliance C and ranks the merged search results.

9.

Search Appliance A returns all results for Buzzword Advertising to the user, including sales contracts, billing and payment information, and records of support contacts with the company.

Determining Search Appliance Roles Each search appliance in a unified environment is also able to act independently of the configuration. For example, a user who wants to see only support documents related to Buzzword Advertising might connect directly to the search page for Search Appliance B and run the search query there.

Google Search Appliance: Configuring GSA Unification

6

A particular search appliance is able to act as both a primary and secondary node in relation to another search appliance. The following example illustrates a pair of unified environments consisting of two search appliances. In unified environment A, Search Appliance A is the primary node and Search Appliance B is the secondary node. In unified environment B, Search Appliance A is the secondary node and Search Appliance B is the primary node.

A more complex use case involves search appliances that are physically located far away from one another. For example, you might have search appliances in Tokyo, London, and Rio de Janeiro. Users in those locations might sometimes need to search local content only, while at other times they would need to search global content. To make these searches as efficient as possible, configure the unified environment as follows. •

Each search appliance is configured as a primary search appliance with the other search appliances as secondary search appliances. In the example, Tokyo is configured as primary with London and Rio de Janeiro as secondary; London is configured as primary with Tokyo and Rio de Janeiro as secondary; Rio de Janeiro is configured as primary with Tokyo and London as secondary.



Use the composite collections feature to define both region-specific collections and a global collection. When search users in Rio de Janeiro search local content, their search is conducted only on content located on the Rio de Janeiro search appliance. For more information on composite collections, see “Using Collections to Direct User Searches” on page 7 and “About Composite Collections” on page 16.

Using Collections to Direct User Searches A query to the primary search appliance in a unified environment returns results from all search appliances in the configuration. By default, all collections on all search appliances are searched when a query is directed to the primary search appliance. You can restrict which collections are searched in two ways: •

Use the site parameter of the query to define which collections are searched. For more information on the site parameter, see the Search Protocol Reference.



Create composite collections on the primary search appliance, which are virtual collections encompassing the specific collections to be searched on each search appliance in the unified configuration. For more information on composite collections and how they work, see “About Composite Collections” on page 16.

Google Search Appliance: Configuring GSA Unification

7

If a user needs to search documents in a collection that is not included in a composite collection, the user must use the search page for that collection’s search appliance instead of the search page on the primary search appliance.

How Crawling and Indexing Work Crawling and indexing in a unified environment are similar to crawling and indexing in single search appliance deployments. Each individual search appliance is configured with its own crawl patterns and each search appliance typically crawls a discrete body of documents. For more information about crawling and indexing in a single search appliance, see Administering Crawl. Depending on how security is set up in a unified environment, you might have to duplicate the crawler access settings from each secondary search appliance on the primary search appliance to ensure that the primary search appliance can correctly authorize and serve results from the secondary search appliances. For more information, see “About Security” on page 8 and “About Crawl Patterns” on page 18.

How OneBox Modules Work In a unified environment, OneBox module configuration is available only on the primary search appliance. In other words, results served from the primary search appliance include results from OneBox modules configured on the primary search appliance, not OneBox modules configured on the secondary nodes. Because spelling checkers are enabled as OneBox modules, spelling check is available only for documents indexed on the primary search appliance. A new feature, user-added results, also uses OneBox modules.

About Security The Google Search Appliance uses secret tokens and private IP addresses to enforce security within a unified environment. The search appliances in a unified environment authenticate each other using shared secret tokens that you provide during configuration. The shared secret tokens must consist only of printable ASCII characters. Certain communications among the search appliances in a unified environment are conducted over a secure private network, including search requests, search credentials transmitted as sessions, and search results that include snippets, whether the results are authorized or not authorized. When you set up a unified environment, you provide additional IP addresses on the Admin Console that the search appliances uses for communicating in a virtual private network. You do not need to change anything in your network infrastructure to meet the guidelines for the virtual private network. You do not need to set up a private network on your existing network. You only need to make sure that the IP addresses are available. When you provide the correct IP addresses, the search appliance creates the network. On the Admin Console interface, the private network IP addresses are called network IP addresses. You can assign or change the private IP addresses at any time. The following guidelines apply to the private network IP addresses: •

The search appliance must able to reach another search appliance’s public IP address on UDP port 500 and on IP protocol number 51 (IPsec AH). Both ports are used by IPSec, the security protocol for communications among the appliances in the configuration.

Google Search Appliance: Configuring GSA Unification

8



Each search appliance in the configuration must be able to ping the other search appliances on their public IP address.



The private IP addresses you choose must conform to the private address space as defined in RFC 1918 and must not overlap with the private address space used by the subnet to which the appliances are connected. For example, if the subnet where the search appliances are deployed uses 10.0.0.0/8, choose the private IP addresses from the 192.168.0.0/24 network. If the 192.168.0.0/24 network is used by the subnet, try the 192.168.1.0/24 range or the 172.16.0.0/12 range.



Do not use the private IP address from the 192.168.255.0/24 network.



Do not use 127.0.0.0/8.



Do not use non-private address space such as 1.0.0.0/8 or 216.239.43.0/24.

The following requirements also apply to security in a unified environment: •

All security configurations on the Crawler Access pages on the secondary search appliances must be added to the Crawler Access page on the primary search appliance.



The primary and secondary search appliances must use the same security policies.

About Authentication and Authorization within a Unified Environment Authentication is the process by which the search appliance verifies a user’s identity. Authorization is the process by which the search appliance determines whether a particular authenticated user is permitted to view a particular document. For information on search appliance authentication and authorization configuration options, see the “Overview” of Managing Search for Controlled-Access Content. You can set up a unified environment to handle user authorization during secure serve in one of two ways: •

The primary search appliance performs all authorization.



The secondary search appliances perform authorization first. If a user cannot be authorized to see a particular document by the secondary search appliances, the primary search appliance attempts to perform the authorization. This process is called delegated authorization. Delegated authorization is enabled by checking a checkbox on the GSA Unification > Host Configuration page.

If you use a Google Search Appliance Connector for indexing and searching files in a content management system and you are setting up GSA unification, you can configure authorization in one of three ways. •

Configure the connector on the primary search appliance and use authorization on the primary appliance.



Configure the connector on a secondary search appliance and use delegated authorization.



Configure the connector on a secondary search appliance and the primary search appliance, then add a Do Not Follow Pattern on the primary appliance so that all connector crawling takes place on the secondary search appliance. Use authorization on the primary search appliance.

Google Search Appliance: Configuring GSA Unification

9

Using Authorization on the Primary Search Appliance Use authorization on the primary search appliance when you want all authorization to be performed on the primary search appliance. The following table tells you how to configure the primary and secondary search appliances when authorization is performed only on the primary search appliance. Type of User Authentication

How the User is Authenticated and Results are Authorized

What to do on the Primary Search Appliance

What to do on the Secondary Search Appliances

LDAP, HTTP Basic, NTLM HTTP, or Kerberos for public serve

Results are public and authorization is not required.

Configure the Content Sources > Web Crawl > Secure Crawl > Crawler Access page on the Admin Console with all crawl patterns from the primary and all secondary search appliances. The primary search appliance does not crawl these pages and no authorization is required.

Configure the Crawler Access page on the Admin Console only with crawl patterns for the current secondary search appliance.

LDAP, HTTP Basic, NTLM HTTP, or Kerberos for secure serve

User logs in to network domain. Credentials for authorization are collected at login time and results are authorized using head requests from the primary search appliance.

Configure the Content Sources > Web Crawl > Secure Crawl > Crawler Access page on the Admin Console with all crawl patterns from the primary and secondary search appliances. The primary search appliance does not crawl these pages, but uses the crawl credentials for authorization. If there are SMB URLs, add those URLs to the Follow and Crawl Patterns field on the Content Sources > Web Crawl > Start and Block URLs page.

Configure the Content Sources > Web Crawl > Secure Crawl > Crawler Access page on the Admin Console only with crawl patterns for the current secondary search appliance.

Cookie site or forms-based authentication for public serve

Serve is public. No result authorization at serve time required.

Copy the configuration from the Crawler Access page on the secondary search appliances to the primary search appliance.

Configure the Content Sources > Web Crawl > Secure Crawl > Crawler Access page on the Admin Console only with crawl patterns for the current secondary search appliance.

Google Search Appliance: Configuring GSA Unification

10

Type of User Authentication

How the User is Authenticated and Results are Authorized

What to do on the Primary Search Appliance

What to do on the Secondary Search Appliances

Forms-based authentication with external login for secure serve

User provides credentials on a form configured on the primary search appliance. The primary search appliance uses a cookie for authorization using the head requestor for each search result returned by a secondary search appliance.

Configure forms authentication for serve.

Configure form-based authentication for crawl.

Forms-based authentication with user impersonation for secure serve

User provides credentials on a form configured on the primary search appliance. The primary search appliance uses a cookie for authorization using the head requestor for each search result returned by a secondary search appliance.

Configure forms authentication for serve.

Configure form-based authentication for crawl.

SAML authentication with external authorization SPI

User provides credentials on a form configured on the primary search appliance. The primary search appliance uses a cookie for authorization using the head requestor for each search result returned by a secondary search appliance.

Configure forms authentication for serve as on a single-search appliance configuration.

Configure form-based authentication for crawl as on a singlesearch appliance configuration.

Forms-based authentication with external authorization SPI

User provides credentials on a form configured on the primary search appliance. The primary search appliance uses a cookie for authorization using the head requestor for each search result returned by a secondary search appliance.

Configure forms authentication for serve as on a single-search appliance configuration.

Configure form-based authentication for crawl as on a singlesearch appliance configuration.

Policy ACLs with an LDAP identity provider

User logs in to network domain. Credentials for authorization are collected at login time and results are authorized according to rules set in policy ACLs.

Copy LDAP information and policy ACLs from the secondary search appliance.

Configure LDAP and policy ACLs.

Google Search Appliance: Configuring GSA Unification

11

Using Delegated Authorization Use delegated authorization when you want authorization to be performed first on the secondary nodes, with authorization on the primary node only when a secondary node is unable to authorize a user to view a document. You enable delegated authorization on the search appliance Admin Console when you set up a unified environment. Check the Use delegated authorization checkbox on the GSA Unification > Host Configuration page on the primary search appliance and on all secondary search appliances. The following table tells you how to configure the primary and secondary search appliances for delegated authorization. Note that forms authentication with IP binding, in which the authentication cookie is restricted to a single IP address, is not supported. Type of User Authentication

How the User is Authenticated and Results are Authorized

What to do on the Primary Search Appliance

What to do on the Secondary Search Appliances

HTTP Basic and NTLM HTTP for public serve

Results are public and authorization is not required.

Check the Make Public check box on the Content Sources > Web Crawl > Secure Crawl > Crawler Access page on the search appliance where the content is crawled.

Check the Make Public check box on the Content Sources > Web Crawl > Secure Crawl > Crawler Access page on the search appliance where the content is crawled.

HTTP Basic, or NTLM HTTP for secure serve

User logs in to network domain. Credentials for authorization are collected on the Universal Login Form page and search results are authorized using head requests.

Configure the required credential groups for content crawled on both the primary and secondary search appliance. Configure settings on the HTTP tab on Search > Secure Search > Universal Login Auth Mechanisms.

If the content is crawled on the secondary search appliance, set Crawler Access and Crawl URL patterns. Configure the required credential group and settings on the HTTP tab on Search > Secure Search > Universal Login Auth Mechanisms.

Cookie site or forms-based authentication for public serve

Serve is public. No result authorization at serve time required.

N/A

N/A

Google Search Appliance: Configuring GSA Unification

12

Type of User Authentication

How the User is Authenticated and Results are Authorized

What to do on the Primary Search Appliance

What to do on the Secondary Search Appliances

Forms-based authentication for secure serve

Users provide credentials on the Universal Login Form configured on the primary search appliance. This process generates a cookie. The primary search appliance shares the cookie with the secondary search appliances, which use the cookie for authorization using the head requestor.

Ensure that the primary search appliance shares the domain name with the source. Ensure that the secondary search appliances have access to the cookie generated on the primary search appliance. Configure the required credential group for content crawled on both the primary and secondary search appliance, and configure the Cookie tab on Search > Secure Search > Universal Login Auth Mechanisms.

Configure with role account and forms authentication for crawling, but ensure that secondary search appliances can use the same cookie generated on the primary search appliance for head requests. Configure the required credential group and settings on the Cookie tab on the Search > Secure Search > Universal Login Auth Mechanisms page.

Multiple content sources protected by forms auth rules

On the Universal Login form, URL patterns belonging to each of the content sources can be mapped to a Sample URL. The sample url will be used with the user name and password the user enters for that credential group.

Configure one credential group for each content source. Put the appropriate values for URL pattern and Sample URL in each credential group, for content crawled on both the primary and secondary nodes, on the Cookie tab on the Search > Secure Search > Universal Login Auth Mechanisms page. Note that even if both the content sources shares same set of credentials, you must configure two different credential groups to map each URL pattern to a different Sample URL.

Configure the credential for contents crawled on the secondary search appliance.

SAML authentication with external authorization SPI

SAML assertion is passed to the secondary search appliances, where the assertion is used to authorize the user to view documents.

Configure the required credential group both for content crawled on primary and secondary and configure the settings on the SAML tab on the Search > Secure Search > Universal Login Auth Mechanisms page.

Configure the SPI. Configure the required credential group and settings on the SAML tab on the Search > Secure Search > Universal Login Auth Mechanisms page.

Forms-based authentication with external authorization SPI

SAML assertion is passed to the secondary search appliances, where the assertion is used to authorize documents.

Configure the required credential group (both for content crawled on primary and secondary) and settings in Cookie tab on the Search > Secure Search > Universal Login Auth Mechanisms page.

Configure the SPI. Configure the required credential group and settings on the SAML page on the Search > Secure Search > Universal Login Auth Mechanisms page.

Google Search Appliance: Configuring GSA Unification

13

Type of User Authentication

How the User is Authenticated and Results are Authorized

What to do on the Primary Search Appliance

What to do on the Secondary Search Appliances

Kerberos authentication

If Kerberos/IWA is configured, silent authentication is used and the user is not prompted for credentials.

Configure Kerberos on the Kerberos tab of the Search > Secure Search > Universal Login Auth Mechanisms page.

Configure Kerberos on the Kerberos tab of the Search > Secure Search > Universal Login Auth Mechanisms page.

Single connector using content feeds

Authorization is performed by the connector. Authentication can be performed by connector or by using any standard authentication method, such as LDAP, HTTP Basic, cookiebased, SSO, and others.

Configure the authentication method on the Universal Login Auth Mechanisms page. Note that if authentication is performed by the connector using the connector authentication SPI, the connector must be configured on the primary search appliance as well as on any secondary search appliances. Use the same connector manager name and connector name for configuration on both search appliances.

Configure the connector for crawling on the Content Sources > Connectors page.

Google Search Appliance: Configuring GSA Unification

14

Type of User Authentication

How the User is Authenticated and Results are Authorized

What to do on the Primary Search Appliance

What to do on the Secondary Search Appliances

Single connector using metadata and URL feeds

Authorization is performed by sending a HEAD request. Any authentication method can be used for authenticating users. Authentication is performed on the primary GSA. Because delegated authorization is configured, authorization takes place on secondary search appliances using a HEAD request.

Configure the specific authentication method on the Search > Secure Search > Universal Login Auth Mechanisms page. If connector manager authentication is being used, configure a connector manager on the primary search appliance with the same name that is used for crawl on the secondary search appliances.

Configure the connector for crawling on the Content Sources > Connectors page. Because the connector sends content feeds, authorization is automatically performed by the connector and no special configuration i is needed.

Multiple connectors

To enable all connectors to use a single identity obtained from standard authentication method (for example, SAML Idp, cookie-based, HTTP Basic), configure the right tab corresponding to the authentication method in Universal Login Form).

To enable all connectors to use a single identity obtained from standard authentication method (for example, SAML Idp, cookie-based, HTTP Basic), configure the tab corresponding to the authentication method in Universal Login Form.

For both use cases, configure the connectors for crawl. Configure the credential group for connector on the secondary search appliance, if this appliance will be used to perform search.

To enable all connectors to use different authentication methods, the connector type must support the connector authentication SPI.

To enable all connectors to use different authentication methods, configure connector information on the Connector tab on Search > Secure Search > Universal Login Auth Mechanisms . The connector must support authentication. Configure the credential group for the connectors configured on the primary as well as secondary search appliances.

Configuring Kerberos Authentication with Delegated Authorization Use these instructions to configure Kerberos authentication in a unified environment that uses delegated authentication. 1.

Configure a pair of search appliances in a unified environment.

2.

On the GSA Unification > Host Configuration page, in the Unification Settings section, check the delegated authentication checkbox.

3.

Enable Kerberos, using the instructions in “Kerberos-Based Authentication” in Managing Search for Controlled-Access Content.

4.

Configure the different crawl patterns on the two search appliances on the Content Sources > Web Crawl > Start and Block URLs page. Ensure that you configure both Start URLs and Follow Patterns.

Google Search Appliance: Configuring GSA Unification

15

5.

On a client machine that uses the Kerberos protocol for user authentication, open a browser. For example, this might be a Windows client machine that uses Active Directory on a Windows server for authentication.

6.

Make sure the browse is configured as described in “Kerberos-Based Authentication” in Managing Search for Controlled-Access Content.

7.

On the primary search appliance, go to the search page.

8.

Type a search query that is expected to return results from URLs crawled on the primary and secondary search appliances. If such a query does not exist, search for a term that is expected to be found only on the secondary search appliance.You should not be prompted for a user name and password and the search should return results that were crawled on the secondary search appliance.

About Composite Collections A composite collection is a collection configured on the primary search appliance of a unified environment that includes one or more collections defined on one or more of the secondary search appliances. You can create two types of composite collections: •

Collections that include only collections from the secondary search appliances in the unified environment



Collections that include collections from the primary and secondary search appliances

You create composite collections to ensure the following: •

User search queries are distributed to the correct search appliances.



The correct collections on those appliances are searched.

There are no limits to the number of composite collections you can create on the primary search appliance. A particular collection on a secondary search appliance can be a member of more than one composite collection. When you create composite collections containing collections from both the primary and secondary nodes, you designate logical operators governing which collections on which search appliances are searched in response to a query. For example, you might have a search appliance with three collections, called X, Y, and Z. •

X AND Y AND Z means that a document must be present in all three collections to be found.



X OR Y AND Z is interpreted as (X OR Y) AND Z and means that the document must be in either X or Y and also in Z to be found.

In another example, you might have two search appliances, 1 and 2, where collections X and Y are on search appliance 1 and collection Z is on search appliance two. X AND Y OR Z means that documents must be in both the X and Y collections or in the Z collection. The AND operator can be used only with collections on the same search appliance, such as X and Y in the immediately previous example. You cannot use AND with collections on different appliances.

Google Search Appliance: Configuring GSA Unification

16

In a unified environment of three search appliances, the administrator might configure a composite collection called MasterCollection on Search Appliance A as described in the following table. Search Appliance Name

Collections Included in MasterCollection

Collections Not Included in MasterCollection

Search Appliance A (primary)

N/A

All collections on Search Appliance A

Search Appliance B (secondary)

ProductOneSupportColl ProductTwoSupportColl ProductThreeSupportColl

WhoDoesWhatCollection

Search Appliance C (secondary)

CustomerDataColl CustomerPeopleDataColl

BonusInfoCollection

When a user issues a search query on Search Appliance A, the search appliance queries all collections on itself and the collections included in the collection called MasterCollection, but does not search the collections on the secondary appliances that are not included. Users who need results from the WhoDoesWhatCollection on Search Appliance B or BonusInfoCollection on Search Appliance C need to issue queries directly on those search appliances, because Search Appliance A does not have access to those collections through MasterCollection. Observe the following cautions in creating composite collections: •

Do not configure composite collections on a secondary search appliance.



Do not give a composite collection the name default_collection, which is reserved for the default collection on each search appliance.

About Front Ends A front end is the search appliance framework used to manage the appearance and underlying functions of search and results pages, including which collections are searched. Modify the front ends on the primary search appliance to associate the correct composite collections with each front end after you create the composite collections in unified environment. You can do this in two ways: •

Add an element to the search page that enables users to select a collection. For example, you might use radio buttons or a drop-down list.



Use query parameters to bind a collection to a front end, then mask the query parameters using a proxy server.

For more information on front ends and associating collections with front ends, see the “Introduction” of Creating the Search Experience. In addition, unified environments can use remote front ends, which are front ends on secondary search appliance. You enable remote front ends by checking the Use host frontend filters instead of Primary frontend filters checkbox on the GSA Unification > Host Configuration page under GSA Unification Settings. You choose a front end on each secondary search appliance that is used to apply the following frontend settings to results from that node: •

Remove URLs



Scoring bias, which is called result biasing elsewhere on the search appliance

Google Search Appliance: Configuring GSA Unification

17



File type filters



Domain filters



Metatag filters

About Crawl Patterns Unified environments function more efficiently when the set of URLs crawled on one node has few or no links to URLs crawled on other nodes. Google recommends that you set up the crawl patterns on each node so that there is minimal interlinking among the nodes. Depending on how results are authorized in your unified environment, you might need to copy crawl patterns or crawler access information from the secondary search appliances to the primary search appliances. For more information, see the tables in “About Authentication and Authorization within a Unified Environment” on page 9. If a secondary search appliance uses SMB crawl patterns, you must add the patterns to the patterns on the primary search appliance’s Content Sources > Web Crawl > Start and Block URLs > Follow Patterns field.

About Database Crawling To use database crawling in a unified environment, you might need to perform some additional configuration. •

If you configure the primary search appliance to crawl the database, no additional configuration is required.



If you configure a secondary search appliance to crawl the database, search results from the database are correctly returned to the primary search appliance. However, the primary search appliance cannot retrieve the database when the user clicks a result from the database. Use these instructions to set up the primary search appliance so that it can retrieve the database.

To set up the primary search appliance: 1.

Log in to the Admin Console of the secondary search appliance.

2.

Navigate to Content Sources > Databases.

3.

Note down the configuration information.

4.

Log in to the Admin Console of the primary search appliance.

5.

Navigate to Content Sources > Databases.

6.

Set up a database crawl configuration that is identical to the configuration on the secondary search appliance.

7.

Configure a dummy SELECT statement for the crawl query that does not return documents. This prevents the primary search appliance from crawling the database. The serve query on the primary search appliance must be identical to the serve query on the secondary search appliance.

8.

Save the configuration.

Google Search Appliance: Configuring GSA Unification

18

For more information on crawling database with the Google Search Appliance, see “Database Crawling and Serving” in Administering Crawl.

About Timeout Intervals and Result Biasing The timeout interval and scoring bias parameters are set on the GSA Unification > Host Configuration page for each search appliance in the configuration. The timeout interval determines how long the primary search appliance waits before timing out a request to a particular secondary node. Set the timeout interval to a lower value for co-located search appliances and to higher values for search appliances that are physically distant from the primary search appliance. Google recommends a 2 second timeout value for co-located search appliances. The scoring bias parameter sets result biasing for the current node. Scoring bias changes the weight assigned to results from a particular node in a unified environment when the final results ranking is calculated. Less influence is a negative bias for results from the current node. No influence is a neutral bias. More influence is a positive bias for results from the current node.

Unified Environment Checklist This section provides a checklist of information you need to collect and decisions you need to make before you set up a unified environment. Task

Description

Determine which Google Search Appliance will participate in the unified environment.

Any Google Search Appliance model running software version 6.0 or later can participate.

Determine the appliance IDs of the participating search appliances.

The appliances IDs can be found on the Admin Console under Administration > License.

Determine the host names or public IP addresses of the search appliances in the unified environment.

The host names or IP addresses are used during the initial configuration of the unified environment.

Determine the network IP addresses for the search appliances.

The network IP addresses are used for communication among the search appliances in the unified environment. The network IP addresses must conform to the private address space as defined in RFC 1918 and must not overlap with any other private address space in use on your network.

Determine which search appliance is the primary search appliance in the unified environment.

You configure composite collections only on the primary search appliance and searches are typically entered on the primary search appliances.

Google Search Appliance: Configuring GSA Unification

Your Values

19

Task

Description

Determine which collections on each secondary search appliance will be assigned to composite collections on the primary search appliance. These collections will be served from the primary search appliance.

These choices determine which collections are searchable within the unified environment using composite collections.

Determine the secret token that the search appliances will use to recognize each other within the unified environment.

The nodes in a unified environment use the secret tokens to authenticate to each other. The secret token must include only printable ASCII characters. Each search appliance in a unified environment has its own associated secret token, which you specify on the GSA Unification > Host Configuration page.

Determine the level of scoring bias for each node in the unified environment.

Scoring bias changes the weight assigned to results from a particular node in a unified environment when the final results ranking is calculated. Less influence is a negative bias for results from the current node. No influence is a neutral bias. More influence is a positive bias for results from the current node.

Determine the timeout interval to enter on each node.

The timeout interval determines how long the primary search appliance waits before timing out a request to a particular secondary node. Set the timeout interval to a lower value for co-located search appliances and to higher values for search appliances that are physically distant from the primary search appliance. Google recommends a 2 second timeout value for co-located search appliances.

Determine the type of authorization to use in the configuration.

Results can be authorized on the primary search appliance or on the secondary search appliances. For more information, see “About Security” on page 8 and “About Authentication and Authorization within a Unified Environment” on page 9.

Confirm that the security configuration is identical on all of the search appliances in the unified environment.

Do not use different authentication and authorization models on different search appliances in a unified environment. For more information, see “About Security” on page 8 and “About Authentication and Authorization within a Unified Environment” on page 9.

Determine which crawl patterns and crawler access information needs to be copied from the secondary search appliances to the primary search appliance.

For more information, see “About Security” on page 8 and “About Authentication and Authorization within a Unified Environment” on page 9.

Determine which front ends to use and how to ensure that the correct collections are bound to the front ends.

The front end determines which collections are searched. For more information, see “About Front Ends” on page 17 and the “Introduction” of Creating the Search Experience.

Google Search Appliance: Configuring GSA Unification

Your Values

20

Setting up Unified Environments This section provides high-level instructions for setting up unified environments. Use the online help system for detailed information about completing each page on the Admin Console. To set up unified environments: 1.

Read this document.

2.

Complete the “Unified Environment Checklist” on page 19.

3.

Log in to the Admin Console on the primary node.

4.

Complete the GSA Unification > Host Configuration page on the primary node.

5.

Log in to the Admin Console on each of the secondary nodes.

6.

Complete the GSA Unification > Host Configuration page on each secondary node.

7.

On the primary node, navigate to GSA Unification > Nodes Configuration and add each of the secondary nodes, including the secret token, appliance ID, host name, and network IP address for each secondary node. When you make changes on this page, the unified environments service restarts.

8.

On each of the secondary nodes, navigate to GSA Unification > Nodes Configuration and add the primary node, including the secret token, appliance ID, host name, and network IP address of the primary node. When you make changes on this page, the unified environment service restarts. Do not add any of the secondary nodes on another secondary node.

9.

On the primary node only, complete the Index > Composite Collections page.

10. On the primary node, make any required updates to the crawl patterns and crawler access information. 11. On the primary node, make changes to the front ends to ensure that queries are correctly distributed to all nodes in the unified environment.

Adding or Deleting Nodes If you add or remove search appliances in a unified environment, ensure that you update the following: •

GSA Unification > Nodes Configuration page on the primary node



GSA Unification > Nodes Configuration page on the search appliance you are adding or removing



Crawl patterns and crawler access



Composite collections



Front ends

Updating a GSA Unification Configuration If you are updating an existing unified environment, follow these high-level instructions in conjunction with the update instructions.

Google Search Appliance: Configuring GSA Unification

21

To update an existing unified environment: 1.

Using the instructions in this document, start the update process on each of the search appliances in the unified environment.

2.

Update the system on each appliance.

3.

Update the software on each appliance, but do not accept the updated software.

4.

Log in to the Admin Console of the primary search appliance in the configuration.

5.

Navigate to GSA Unification > Nodes Configuration.

6.

Click the Edit link for each of the secondary search appliances in the configuration.

7.

On the Test Mode drop-down list, choose Yes.

8.

Click Save.

9.

After you edit each of the secondary search appliances, click Apply Changes.

10. Log in to each of the secondary search appliances. 11. Navigate to GSA Unification > Nodes Configuration. 12. If any secondary search appliances are listed, perform the following steps. a.

Click the Edit link for each.

b.

On the Test Mode drop-down list, choose Yes.

c.

Click Save.

13. Click Apply Changes whether or not there are any secondary search appliances. 14. Login to the primary appliance. 15. Navigate to GSA Unification > Nodes Configuration. 16. Wait for the status buttons to turn green. 17. Perform enough basic searches in test mode to determine whether the system is working correctly. 18. If you are satisfied, accept the software update on each of the appliances. 19. On each search appliance, navigate to the GSA Unification > Nodes Configuration page and change the Test Mode dropdown to No for each of the listed secondary search appliances. 20. Click Apply Changes on each appliance. 21. On the primary search appliance, wait for the status button to turn green on GSA Unification > Nodes Configuration page. The update is complete.

Setting up Mirroring in a Unified Environment The search appliance supports mirroring for a node participating in a unification configuration. For example, suppose your unification configuration contains a primary node A and a secondary node C. You want to add node B as a mirror of node A and node D as a mirror of node C. To accomplish this: •

On node A, add node B as a replica



On node B, add node A as a primary

Google Search Appliance: Configuring GSA Unification

22



On node C, add node D as a replica



On node D, add node C as a primary

There is no automatic failover, so if a node goes down, you have to adjust the topology manually. For example, if node A goes down, you would have to direct the serve traffic to node B while also adding node C as the unification secondary of node B. Similarly, if node C goes down, you would have to add node D as unification secondary of node A. If you want to set up mirroring, but not in a unified environment, use GSA mirroring in a GSAn configuration. For more information, see Configuring GSA Mirroring. To configure mirroring in a unified environment: 1.

On the Admin Console for the primary node, navigate to GSA Unification > Nodes Configuration.

2.

In the Nodes in Your GSA Unification Network section, click Add.

3.

On the drop-down list, designate the remote search appliance for mirroring as a Replica node.

4.

Type in the Appliance ID of the remote search appliance.

5.

Type in the Appliance Hostname or IP address of the remote search appliance.

6.

Type in the GSA Unification Network IP Address of the remote search appliance.

7.

Type in the Secret Token of the remote search appliance.

8.

Click Save.

9.

In the Nodes in Your GSA Unification Network section, click Add.

10. On the drop-down list, designate the remote search appliance for mirroring as the Primary node. 11. Type in the Appliance ID of the remote search appliance. 12. Type in the Appliance Hostname or IP address of the remote search appliance. 13. Type in the GSA Unification Network IP Address of the remote search appliance. 14. Type in the Secret Token of the remote search appliance. 15. Click Save.

Troubleshooting This section provides information for solving problems you might encounter in configuring or using unified environments.

Using the GSA Unification Network Stats and GSA Unification Diagnostic Pages to Find Problems On the Admin Console, the GSA Unification Network Stats and GSA Unification Diagnostic pages provide statistical and diagnostic information you can use to diagnose problems with a unified environment. For more information, see the online help for the pages.

Google Search Appliance: Configuring GSA Unification

23

After you click Apply Configuration or Apply Changes on the Nodes Configuration or Host Configuration page, the search appliance takes up to two minutes to complete all background processes. The network statistics page is not automatically refreshed after all processing is complete. Wait two minutes, then refresh the browser by clicking its Refresh button. You should see green indicators at that time.

Users See 404 Errors After Clicking Results Different configuration problems cause 404 errors when users click search results. Check the URL patterns in the Follow and Crawl Only URLs settings on the primary and secondary search appliances. Ensure that all Follow and Crawl Only URLs on the secondary appliances also appear on the primary search appliance. If you are using a database crawl, a user might see a 404 error after clicking a search result. When this happens, it means that the primary search appliance is not set up with the database configuration information from the secondary search appliances. To correct the error, copy the database configuration information from the secondary search appliances to the primary search appliance.

Results from Secondary Search Appliances are Not Available on Primary Search Appliance If you find that results from the secondary search appliances are not available on the primary search appliance, check the names of the composite collections. If different collections designated as part of a composite collection have the same name, the site parameter is expanded at query time in such a way that the results are not available on the primary search appliance. If this is the case, you can obtain results from the secondary search appliances on http://0:9999/search, but not through the configured front ends. If you find that results from the secondary search appliances are not available on the primary search appliance, ensure that nodes are added as secondary nodes only on the primary search appliance. Do not add secondary search appliances to other secondary search appliances. In addition, ensure that composite collections are configured only on the primary search appliance.

Unexpected Authorization Behavior If you configure delegated authorization incorrectly, you encounter unexpected authorization behavior. If you are using delegated authorization, ensure that it is enabled on the primary and all secondary search appliances in the unified environment. •

If delegated authorization is enabled only on the primary search appliance and not on the secondary search appliances, the secondary search appliances do not perform authorization. Only the primary search appliance is performing authorization.



If delegated authorization is disabled on the primary search appliance but enabled on the secondary search appliances, the secondary search appliances perform authorization, but the primary search appliances ignores the authorization and performs its own.

Google Search Appliance: Configuring GSA Unification

24

7.2 - Configuring GSA Unification

Google Search Appliance running software version 6.0 or later can be configured ... Search Appliance C searches its local index, which contains accounting information. .... management system and you are setting up GSA unification, you can ...

371KB Sizes 1 Downloads 394 Views

Recommend Documents

7.0 - Configuring GSA Unification
When GSA unification is configured, personal content from the Cloud .... All security configurations on the Crawler Access pages on the secondary search ...

7.4 - Configuring GSA Unification
Google Search Appliance: Configuring GSA Unification. 3. Contents ... Using the GSA Unification Network Stats and GSA Unification Diagnostic. Pages to Find ..... provider. User logs in to network domain. Credentials for authorization are.

7.2 - Configuring GSA Unification
Users See 404 Errors After Clicking Results. 24. Results from Secondary Search Appliances are Not Available on. Primary Search Appliance. 24. Unexpected ...

GSA Unification
Data Source Feeds. Retrieve, delete, and destroy data source feed information for the search appliance using the feed feed. The following parameters let you search for a string and retrieve source statements. Use the following properties to view data

7.0 - Configuring GSA Mirroring
appliance, to provide high availability serving. .... see the section “Setting up monitoring” in the article Design a search solution (http://support.google.com/.

7.0 - Configuring GSA Mirroring
Google and the Google logo are registered trademarks or service marks of ..... You must set up CA certificate use in the mirroring configuration in one of the two ...

7.4 - Configuring GSA Mirroring
You may not attempt to decipher, decompile, or develop source code for any Google product .... value of the DNS alias to the replica, or using an external application. ... support.google.com/gsa/answer/2644707#Monitoring). .... Right-click the About

7.4 - Configuring GSA Mirroring
Google Search Appliance software version 7.2 and later .... The search appliance models you have determine which machine is the master and which .... configuration over a virtual private network. .... On the drop-down list, choose Replica. 6.

7.2 - Configuring GSA Mirroring
property rights relating to the Google services are and shall remain the exclusive ... 5. About GSA Mirroring. 5. Deciding Which Mirroring Configuration to Use. 7 ... This document is for you if you are a search appliance administrator, network ...

GSA
An open source software package that Google provides that manages creation .... photos, names, and phone numbers. .... Multipurpose Internet Mail Extensions.

GSA
Oct 2, 2014 - The guide assumes that you are familiar with Windows or Linux ... All Connectors 4.0 are installed on a separate host server rather than the ...

School Code Unification
Responsibilities of School HM/Principal. Page 3. www.itschool.gov.in. Page 4. Login Screen. Page 5. GENERAL SCHOOL SECTION. Page 6. ENTRY FORM ...

GSA Deployment Architectures
GSA administrators: Deploy and configure the GSAs to best serve content .... The web application can make use of the admin API and access ... Page 10 .... Increase the GSA host load, or deploy multiple connectors to increase traversal ...

GSA Security
For example, are they office documents, web pages? database records? ... need to come back as fast as possible to give the end users the best experience ... 10. Although not as commonly used as Per-URL ACLs, it is a very flexible ..... Authorization,

GSA Deployment Scenario Handbook
IT environment. GSA configured for public search with internet and intranet web sites and file .... Acme Inc. will configure start URLs for top-level pages. For content that .... Page 10 ... hosted on an external server in a Production environment.

GSA Connectors Developer Guide
Dec 2, 2014 - Advanced Access Control : Fragment ACL .... In the GSA Admin Console, go to Content Sources > Web Crawl > Start and Block URLs. 2.

GSA Tax Invoice -
GSA. 1888 Building, University of. Melbourne. Parkville VIC 3010. Australia. Phone: 83448380. Invoice: IV00000000214. Tax Invoice. Invoice date: 16/09/2016. Bill to: Melbourne University Nepalese Students' Society (MUNSS). Due: 21/09/2016. DESCRIPTIO

GSA Getting Started Guide
Configuration worksheet—information you will need. Chapter 2 Install, Configure, Crawl, and Search. Installing the GSA. Configuring crawl. Configuring crawler access. Checking crawler progress. Using search for the first time. Troubleshooting commo

Grand Unification and Enhanced Quantum ...
Oct 20, 2008 - 1Catholic University of Louvain, Center for Particle Physics and Phenomenology, ... coupling constant unification, if higher dimensional operators induced by gravity ..... unification is favored by, e.g., LEP data seems farfetched.

GRAND UNIFICATION WITHOUT HIGGS BOSONS ...
from ATLAS and CMS at the Large Hadron Collider, it is worthwhile to entertain ... L Mu iju. (j). R + ¯d. (i). L Md ijd. (j). R + ¯e. (i). L Me ije. (j). R + ¯ν. (i). L Mν ijν.

Grand unification on noncommutative spacetime - Springer Link
Jan 19, 2007 - Abstract. We compute the beta-functions of the standard model formulated on a noncommutative space- time. If we assume that the scale for ...