| Site Server 3.0 - Cramsession |
Deployment:
Content deployment distributes content across directories on multiple servers and across remote secure networks, to multiple destination servers. Data validation and restart capabilities ensure reliable deployment. You can deploy files, directories and ACLs (access control lists).
Content deployment uses both staging and end point servers. The staging server receives content from authors and administrators, stages content for review, and deploys content. The end-point server receives content from the staging server.
The content deployment process:
1. The administrators create projects or routes on the staging and end-point server. One to deploy content and one to receive content. You cannot run a project if you don’t configure matching projects at each end.
2. The content to be deployed is either submitted by an author, or retrieved from the internet server by the administrator
3. Content deployment replicates the content from the staging server to the end-point server.
4. Both the staging and the end-point servers generate reports for the administrator, documenting deployment status.
Content deployment servers retrieve and deploy content based on their project and route definition.The staging server is running Windows NT. It receives content from content authors, other staging servers, and from http/ftp servers. Then it tests the content before it deploys the content to end point servers or other staging servers.
Testing enables the administrator to review the deployed content and ensure that all the links are functioning properly.
The end-point server is running Windows NT or UNIX server. It receives data from the staging server. The end-point server cannot deploy content, but is used to handle user requests for web pages. UNIX servers can only be used as end-point servers.
Before deploying content the administrator needs to configure all staging and end-point servers with the appropriate properties. These properties determine:
1. How event messages will be distributed among various event stores
2. Who should receive emails reports
3. Which servers will receive content posts
4. How many rollbacks to allow
Server properties are set individually on all of the servers at your site.
There are 3 possible configurations that can be used.
1. A single point server: involves only one server that contains a source and destination directory. Content authors submit to the source directory, where the administrator test and approves the content before deploying it to the destination directory. This configuration is common at small intranet sites.
2. Staging/End-Point Configuration: Content is deployed form a single staging server to one or more end-point servers. Common in Intranets and small ISPs.
3. Multiple stage and deploy configuration: In this configuration multiple staging servers and multiple end-point servers move content around a complex geographically distributed Internet site. Content is distributed thru a chain of staging servers around the world. This configuration is normally used by large ISPs that require mirrored content on multiple servers.
Content deployment uses 3 kinds of projects to replicate content:
1. Content deployments: used when deploying content from a staging server to mid-point or end-point servers
2. Internet retrieval: to retrieve content from an http/ftp server
3. Component deployment: to deploy Java and COM applets packaged in .cab files
You can use the content deployment wizard instead of creating the projects manually.
Configuring projects requires specifying information about the content you want to stage and deploy. After you specify the type of project you must specify:
1. Which content to include in the project
2. Where to deploy the content
3. When to deploy the content
4. Project security information
5. Who receives project status reports
Sources and destinations are the most important elements of the content deployment project. You must create a project on EVERY server included in the replication.
Content deployment projects includes:
1. End-point servers: If an end-point server is the destination, you must first create the project on the server before the server can be a destination
2. Routes: using routes, any project can be stored in a route directory
3. Directories: directories can be destinations
Replicating metabase information enables you to clone your web servers. The only part of the configuration that is not cloned is TCP/IP configuration.
Routes are predetermined paths that staging and end-point servers use to deploy and receive content.
After a route has been created, an identically named project on every server in that route is subsequently created so that you can deploy content to those servers. The administrator must create the route on every server in the route.
There are 2 ways to create a route:
1.
2.
Provide the following to each server:
1. The route name
2. The route directory: which contains the route and all projects that use the route
3. The names of all servers included in the route
You can change the servers in a route at any time, either by adding new ones or removing existing ones. You can delete a route using MMC or WebAdmin. ALL projects using the deleted route will be affected. When deleting the route consider the following:
1. All replications following that route should be complete
2. You should delete the route from all servers in the route
3. The route will be deleted from all project definitions of all projects using that route.
Filtering content enables you to determine at a file or directory level what to include in and what to exclude from replications. Creating a filter requires building an ordered list of files or directories to include in or exclude from a replication. Adding content filters to a project does not affect content that is already deployed.
Filters are evaluated and applied in the order displayed on your screen. Since one filter can undo another, it is recommended that you think about all of the filters you want to apply and then begin with the most general filter and work toward the most specific.
When using filters, please note the following:
1. When a directory is excluded, all files within that directories and all subdirectories are excluded.
2. When a file is included also its parent directory is included
3. If an invalid filter is included, no content is included
4. If an invalid filter is excluded, no content is excluded.
Securing you projects requires configuring project users, authentication accounts, and ACL deployment.
You can effectively secure your project by creating groups, each with its own set of access privileges. Windows NT and site server administrators have full control.
Members of the publishing operators group, perform service operations, such as starting, stopping and rolling back projects.
The publishing administrators can administer local and remote servers. Theirs tasks includes:
1. Project administration tasks: adding, deleting, and editing routes. Adding and deleting users to/from projects
2. Server administration tasks: adding/removing servers, modifying server properties, starting/stopping/pausing content deployment.
When content is replicated the server sending content must have the right Windows NT credentials to give to staging/end-point servers. The content sent is signed but NOT encrypted. This means content cannot be changed.
Projects can be scheduled:
1. Manually
2. Automatically: Whenever content changes
3. Scheduled: when scheduled
4. Scheduled with the apply option: when the content is applied
Administrators can post data via upload pages or posting acceptor. Posting acceptor is a server-based tool that receives content from content authors using the http protocol. Posting acceptor can forward/repost data. Frontpage98 and Interdev can be used to post data.
To Monitor deployment use the content deployment service. MMC project pages show the status of a project, and Performance monitor maintain record of transmission and authorization data. After analyzing the data you can make adjustments, if necessary.
The rollback command is similar to the undo command: You can specify as many rollbacks as you want but you cannot rollback content at the source.
3 standard reports are available:
1. Project reports
2. Replication reports
3. Full reports
You can configure the server to send these reports by email.
Search:
Site server includes a search feature similar to Index Server. The main differences between the 2 are:
|
Functionality |
Site server |
Index server |
|
Crawl Capabilities |
Generated Web content, Web sites, Exchange Public folders, ODBC databases |
File system files |
|
Number of computers that can be searched |
Multiple computers, including internet sites |
One Location |
|
Integration with Windows NT security |
Across multiple computers |
Single server |
|
Updates of catalogs |
Scheduled full crawls, with incremental crawls |
Automatic, based on file change notification |
|
Automatic distribution of catalogs to other servers |
YES |
NO |
|
Distributed indexing |
YES |
NO |
|
Multiple catalogs searching |
YES |
NO |
|
Administration |
Wizard, MMC, WebAdmin, command line |
MMC and WebAdmin |
A catalog is a document index containing information about a document, but not the actual document. For each catalog you build, you must first create a catalog definition.
A catalog definition contains the instructions and parameters for building a catalog, and it’s stored in the catalog build server (the host used to build catalogs)
Search builds a catalog by:
1. Using a catalog definition to gather content
2. Extracting words and attributes from collected documents
3. Creating a catalog index
4. Compiling and propagating the created catalog
Search uses 2 services: the
gatherer service to create catalogs, and the Search service to perform the
actual search.
Search hosts can be of 2 kinds:
1. Build servers, are building the catalog
2. Search servers, are performing the actual search and are storing the catalogs. They must have access privileges to allow visitors to search them.
A single host can perform both operations, but Microsoft does not advise it.
You can conserve network bandwidth by placing the build server close to the location where documents are stored and the search server close to the site that visitors will search.
You can use multiple servers to distribute the load.
Collecting documents is called Gathering. There are 3 kinds of gatherings:
1. Web link crawl: Search can use HTTP to crawl documents by following links.
2. File crawl: Search can use File protocol, to crawl the files present in one directory. Search can crawl ANY file system that can be mounted remotely.
3. MS Exchange crawl: Search can use exchange public folder as a start address and crawl messages on a computer running MS exchange server by using the exchange protocol.
The crawl history is a list of links that search has crawled. It is used to eliminate duplicate crawling of the same directory.
Search uses filters to extract the full text, files, and documents attributes from the
Documents it gathers. Filters are standard plug-in modules that conform to the MS standard Ifilter interface.
To prevent the index from becoming filled with words that do not help visitors to find documents, search uses Noise Word lists.
3 actions to create/use a catalog: Gathering, Compiling and Propagating.
When a document is catalogued, search determines which language the document is written in, and then set the value of “detectedlanguage” for the document.
When a search is conducted the “detectedlanguage” value is checked. Depending on the language different world breakers and word stemmers are used.
Word breakers are responsible for identifying words in a document
Word stemmers are responsible for taking words and making grammatically correct variations of that word.
A search system can use multiple servers. Since WebAdmin allows you to administer only the hosts in which it’s running you must use MMC to manage a multiple host search system.
There are some options to keep in mind when specifying what to crawl:
Kinds of crawls:
1. Crawl all subdirectories under the start address or
2. Crawl by following links from the start address
You can limit page and site hops that search makes.
Kinds of files:
1. By specifying the type of files to crawl or to avoid
2. By selecting the protocol to use
3. By specifying weather to crawl links (URLs) with Question marks.
How far search can go:
1. Directory depth
2. Page and site hops
3. Site and Path Rules
4. Specific Microsoft Exchange public folders
When crawling search identifies itself by: user agent and/or by email.
To follow crawling etiquette Search enables:
1. Setting hit frequency rules
2. Obeying the rules of robot exclusion
3. Leaving behind an email address to contact in case of problems
Other option to set:
1. Setting resource usage. The amount of resources used by your system to build the catalogs
2. Access accounts: to access each catalog server
3. Crawling identification
4. System proxy information
5. Crawling timeout periods
6. Save
and Load configurations: to save configurations from on catalog server and load
them on other servers
To log on to the search administrative interface you must be on of the following:
1. Administrator on the local host
2. Site server Search administrators
3. Site server Knowledge administrators
4. Site servers administrators
You must setup an administrative access account on the catalog build server that have administrator privileges on all hosts to which you want to propagate catalogs.
When accessing content to crawl you must setup a content access account on the catalog build server to access external data.
Security depends on the kind of data you need to access:
1. File Access: The content search determines the files you can access or not
2. HTTP access: First search tries anonymously, then it uses the access account
3. MS Exchange Authenticates the content access account when crawling folders
When configuring your search system, consider the following:
When creating a catalog definition consider the following:
1. The number of catalogs you need
2. The type of information for which site visitors will search
3. Whether to TAG your HTML documents before cataloging them
4. Where to start crawling and from which site to gather documents
5. Whether to adjust the sites hit frequency
6. Whether to set site rules that limit which site or path search crawls
7. Which files you want to crawl and which protocols to use
8. The type of information (attributes) to store in catalogs
9. Whether you need to change the default schedule for building catalogs
10. Whether to use different hosts for searching
If you want to add messages from exchange public folders you must before Configure you search host with information about the computer running the exchange server
If you want to build a catalog that contains database records, you must create a database catalog definition and then setup/modify ASP pages.
Catalogs can be one of the following types:
1. Crawl catalog: built by crawling files
2. Notification: built by receiving information thru a notification source
3. Database: You build database catalogs by crawling a table in an ODBC database
Catalog definitions for a crawl must contain the catalog name and the start address and crawling policy. It can contain also information on site and path rules, file types, propagation and build schedule
Catalogs definitions for Notification catalogs must include the catalog name and the notification source. It can contain information on host to which propagate and how many documents to receive before updating catalogs.
Catalogs definitions for Database catalogs must contain: catalog name, ODBC source, Table to catalog, database column to use for content and description.
When building a catalog you have 2 build options:
1. Full builds: Starts with an empty catalog, and use start address as the starting points for the crawl.
2. Incremental builds: Start with the start address and a previous catalog. Updates any changes to the contents since the last crawl.
NOT all the changes made to a catalogue definition affect an Incremental build: in some cases you must restart with a full build.
You can view the status of a catalog at any time, to see Whether the catalog is in an idle, crawling, compiled, propagating or error state.
Smaller catalogs have higher search performance. One way to decrease the size of a catalog is to reduce the number of retrievable attributes.
To run a search, you must specify which part of the catalog to search. Catalogs are organized by columns, and you must specify which column to search into.
Results ASP pages are stored in a virtual directory on your web server, that has Script permission.
Search performance can be optimized in 2 areas:
1. Cataloging performance: By minimizing use of other resources during cataloging and minimizing catalog space.
2.
Searching Performance
Cataloging can be improved by:
1. Configure server for maximum network throughput
2. Stopping Index Server (if not used)
3. Minimizing number of columns for catalog
4. Using incremental crawls when building a catalog
5. Scheduling catalog builds
6. Setting the site hit timing for crawling
7. Setting timeouts periods for crawling
8. Setting the resource use on the catalog build server
Search performance can be improved in speed and accuracy by:
1. Improving performance on search server (e.g. stopping index server)
2. Decreasing site of searching catalog
3. Decreasing number of catalogs
4. Setting the resource use of the search server
5. Search page design
6. How well your search page helps your users target results
Membership server is a collection of software components, which manages P&M user data and other information. It performs 4 key functions:
1. Managing user registration and user data
2. Protecting and sharing user data
3. Verifying user identity
4. Controlling access to content on your site.
Each Membership server can have some of the following components:
1. Membership Directory: central repository of user data.
2. Authentication service: Tying together the various functions involved in site security
3. Active User Object (AUO): Present a single interface to applications to access and integrate data from multiple user directories
4. LDAP service: an internet standard for accessing user information in directories, and provides standard, platform independent access to Membership directory
5. Message builder service: constructs and sends mailings
Site server enables you to choose from many configuration options:
1. Single Server: Typically used for test and evaluation purposes or for small web sites. All components resides on a single computer
2. Basic Multiple Servers: For larger sites. The Membership directory database is installed on a dedicated computer
3. Replicated multiple servers: Suitable for High-End sites, multiple application servers are deployed to support multiple applications type and performance requirements. Each Application server has a Membership server instance installed on it.
4. Dedicated LDAP configuration: It may be advantageous to put a tier of one or more dedicated LDAP service computers between the web servers and the Membership directory database and stop the LDAP services that reside on the web servers. This configuration can offer an ideal balance of security and efficiency: the application server can be exposed to the Internet, while the LDAP service can sit behind a firewall.
Configuration limitations:
1. User Data: User profiles with personalization and optionally passwords
2. Site data: Information about your site and organization, like the site vocabulary.
3. Membership Directory Schema: Defines the objects, data, and relationships of the user and site data in the Membership directory
The Directory tree is a representation of the Membership directory as a hierarchical structure, or tree of data objects. There are two general categories: Container objects have Childs objects in the tree. And Leaf objects: have no child objects.
The Authentication service retrieves user properties, including passwords, from the membership directory and supplies them to the AUO. It also validates the password provided by the user, by comparing it to the one in the membership directory.
The AUO is an Active Directory Service (ADS) Component Object Module (COM) that you can configure to access and integrate user attribute data from a membership directory and other data sources. Using the AUO, you can create a virtual user attribute schema that can be accessed from any script or program.
User accounts can be created in 3 ways: with administrative pages, with analysis pages and with registration pages.
There are 3 types of users:
With membership authentication, you can create 2 kinds of user objects in the membership directory:
When you create a Membership directory, you must specify the authentication method used.
With Membership authentication user and passwords are stored in the Membership directory. With Windows NT authentication, they are stored in the Windows NT server directory database.
Membership authentication has advantages for Internet sites, while Windows NT authentication is useful for Intranet sites.
Windows NT methods of authentication: Cookie Authentication, Clear text/Basic Authentication, Windows NT challenge/response, and client certificate
Membership Authentication methods: Automatic cookie, Clear text/Basic, HTML forms, distributed password, and client certification.
In distributed password Authentication user’s identity is validated by password only.
Public is a P&M built-in group to which each user belongs
When a user tries to authenticate and fails, the user is routed to the authentication pages. The name of this file is privilegedcontent.asp, and all users have access to this file.
There are 4 levels of protections you can apply to your content:
1. Public content: not protected
2. Registered content: Users must fill in a Form
3. Secured content: only registered users can access
4. Subscribed content: provided to a subset of your registered users.
When a user requires a particular file, Access control is used to check if he has permissions. ACLs (access control lists) are then used.
Personalization and Memberships enables any Internet site to present unique personalized content automatically to specified users, using a variety of delivering mechanisms.
Before P&M can use content from each source, authors must first identify (tag) content with attributes that administrators define to use specific user needs.
User profiles are stored in the Membership directory, and are the primary source of user information for personalization. They contain a set of demographics and user preference data that is used to provide a more personalized experience to site visitors.
Personalization rules are statements that test a condition and then perform and action when the condition is true.
Personalized information can be delivered thru personalized Web Pages, email or push channels.
The membership directory schema is the data structure that defines how user profiles are stored.
Attribute schema objects define attributes. All attribute definitions are stored as attribute schema objects
Class schema objects define classes.
All objects in the membership directory, including user profiles, are an instance of a class, and every object is defined by and consists of a set of attributes.
Managing schema objects can involve the following tasks:
How configuration of attributes works:
One of the simplest ways of personalization is adding a user property to a web content template. There are 2 ways of doing this: by using the insert property DTC or by using VBscript.
Rule builder is the tool of rule manager used to create new rules or modify existing rules for delivering personalized information thru web pages/email. Rules are usually built in rule manager, and then saved in rule sets.
Rule exceptions set up conditions that prevent a rule from being executed, even when other conditions are met.
Personalized web pages are created with the rules you specified before.
Also personalized email can be done, with Direct Mailer. Direct mailer retrieves the custom text files and them assembles them in an email that will be sent to user.
You can have 2 types of distribution lists: Static or dynamic, depending on if you created them or they are dynamically created from a database.
Knowledge manager is a web-based application, that can filter documents by area o interest, define and schedule briefs regarding content, and browse files by category.
It’s mainly used to enable visitors to find information and receive updates when information is added or changed.
Knowledge Manager makes information available in several ways:
The search page is essentially a query page to get information.
Knowledge manager enables
site visitors to learn from each other by using briefs prepared by co-workers,
the administrator, local experts or other site visitors.
Briefs are documents that contain the information organized around a specific topic. Briefs are composed of 2 types of sections: saved search sections, which are saved search queries, and link list sections, which are lists of useful URLs along with their descriptions.
The Brief delivery page offers users a choice of how they want to receive briefs updates. You can receive briefs updates thru email or your personal briefing channel.
Channels are conduits thru which information is stored and delivered. They are setup by administrators and are centered on topics. Users then decide to which channel to subscribe.
After setting up the search center and creating briefs, you only need to add defaults and links for your site to use knowledge manager. Basic configuration is in the config.idc file.
The Knowledge Manager database is in Access format and is located in Mssiteserver\Data\Knowledge directory. The database contains the following tables:
You can use Analysis to analyze usage at your site, including who visits the site, where they go, and how long they stay.
Analysis offers the following tools:
If you are installing site
server on multiple computers it is advised that you install analysis on a
dedicated computer.
You can improve analysis by configuring IIS to gather the following information:
You MUST use the MS W3C extended Log file Format configured on your IIS.
If your server is already logging data when you install a new filter, you need to RESTART your SERVER for the filter to take effect.
The data falls into 5 categories:
1. Hits: any request by the user
2. Requests: any hits that successfully retrieves content
3. Visits: series of requests by a user
4. Users: any entities associated with a hostname that access a site.
5. Organizations: group of related users that have registered one or more domain names.
Usage import enables you to import log files into the Analysis database, after which you can start report writer and run reports on the data. With usage import, you can also delete imported log files and requests.
Use the following to configure/Manage usage import:
Once you have configured usage import, you can import log files with the import manager. When you import files, you can either import a single log file or several log files from one log data source in one import. By importing several log files in one import, the logs are stitched.
When importing from external data sources, you can import from:
To enrich data in your database, you can use:
1. IP resolution: (you must resolve the IP address before),
2. Whois queries
3. Title lookups.
You can use scheduler, to automate a number of tasks performed in the report writer, custom import, and usage import. This is useful if you want to optimize system resources or automate regular tasks
Using scheduler consists of scheduling import jobs, and adding task(s) to the job. The Imports runs according to the schedule you specify. Once you have scheduled a job you can choose to activate it. When a job runs messages are logged to Uimport.log.
Report writer enables you to generate reports with which you can identify trends and your most popular pages, learn how users are navigating thru your site, and analyze where users come from.
Every report has a report definition. Report definitions are made up of elements that you can add, delete, or modify.
Basic elements of the report definition are:
You can run report writer from 3 interfaces: Windows interface, WebAdmin, Command line.
Site server provides a tool called content analyzer, to analyze content of a web site.
With content analyzer you can analyze resources and the routes that connects them
You can access Content analyzer thru one of the 3 interfaces:
When you create a project for a web site, content analyzer explores the site, starting with the URL of the site you specify.
Content analyzer distinguishes between 5 kinds of resources:
A content analyzer project is a file that contains a map, a graphical view of your site. You create a new project by exploring the site you want to analyze. You can create a project starting from an URL or from a file in your file system.
When content analyzer finishes exploring the site it saves the results in a project and displays the site map. A project contains the following:
The options for exploring can be the following:
When you create your project you can choose to explore the entire site, or explore it with limits. You may want for example to analyze just part of a branch or explore a single page. You can do that by specifying the number of pages/levels.
Once you have created a project you can use content analyzer to:
You can use search analyzer search option to find and analyze resources on your Web site. Content analyzer offers both a quick search and a more advance search to specify limitless combinations of criteria.
Each content analyzer window offer different advantages:
The site windows uses 2 ways of showing the site structure:
The analysis window displays the results of a search and the respective resources in 2 panes:
The properties window displays details about links and properties of the selected resource or page:
There are 3 kinds of link types:
When analyzing links the main goal is to make sure that links are connecting the proper resources to one another. You can use the site window to examine the link structure. To begin examining a link you musts FIRST select a resource and then select a link type to show.
Usually a link is broken for one of the following reasons:
Usage data can be gathered from log files and associated with your site. The information that you can gather consists of 2 types:
With this data you can:
Port |
Number |
|
FTP |
21 |
|
Telnet |
23 |
|
SMTP |
25 |
|
HTTP |
80 |
|
SSL |
443 |
|
SQL |
1433 |