A Machine-Learning Approach to Discovering ... - Semantic Scholar

Viewer
Transcript

A Machine-Learning Approach to Discovering Company Home Pages Wojciech Gryc, Prem Melville, Richard D. Lawrence IBM T.J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598

{wgryc,pmelvil,ricklawr}@us.ibm.com ABSTRACT For many marketing and business applications, it is necessary to know the home page of a company specified only by its company name. If we require the home page for a small number of big companies, this task is readily accomplished via use of Internet search engines or access to domain registration lists. However, if the entities of interest are small companies, these approaches can lead to mismatches, particularly if a specified company lacks a home page. We address this problem using a supervised machine-learning approach in which we train a binary classification model to classify potential website matches for each company name based on a set of explanatory features extracted from the content on each candidate website. Our approach is able to identify a correct home page or recognize that a valid home page does not exist with an accuracy that is 57% better than simply taking the highest ranked search result as the correct match.

General Terms Text categorization, text classification, web mining, crawling, data quality

1.

INTRODUCTION

For many business-related applications, it is useful to know the Internet home page (or URL) for a specified set of companies. If the companies are large, e.g. Fortune 500 companies, this task can be accomplished easily by submitting each company name to an Internet search engine and capturing the first returned result. However, if we require the home page for a very large number of smaller companies, then such an approach is no longer feasible due to the sheer number of companies, and the observation that the first search return for a smaller company is less likely to be the correct home page. Indeed, many very small companies will have no website at all, and it is important that an automated capability detect such cases. There are several major data vendors that offer detailed

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.

“firmographic” information on a very large number (order millions) of companies, including very small companies. These data include information such as annual revenue, number of employees, the primary industry in which the company operates, and contact information including the company website. As noted above, it cannot be assumed that essentially all companies listed in such directories have a website. Specifically, in a random sample of 947 companies with annual revenue between $3M and $100M, 372 (39%) of these companies did not have an observed website. Hence, correct resolution of a company’s website, including those without a site, is a key data quality issue for companies that sell such data. An important application that requires reliable website identification arises when we seek to join structured firmographic information for a large number of small companies with indexed content crawled from their respective websites. The resulting database of merged structured and unstructured content provides a rich set of data for machine-learning applications [10], and can also enable new focused search capabilities to address a range of marketing objectives. A key initial step in this process is automatically determining the correct home page for the tens of thousands of companies required in this application. Domain name registration lists may be used to help identify the home page address for a given company. Registration of Internet top-level domains is managed by the Internet Corporation for Assigned Names and Numbers (ICANN). A top-level domain (TLD), sometimes referred to as a top-level domain name (TLDN), can be registered through domainname registrars that have been accredited by ICANN. A number of companies have been accredited by ICANN to act as registrars in one or more TLDs, including, for example, .biz, .com, .info, .net and .org. Using TLD registration lists, it is possible to determine if a specified domain is currently registered, and if so, the name of the entity that registered the domain. However, the use of domain lookup can lead to incorrect results for companies with a small Internet presence. Many of these companies rely on other companies to build, host, and maintain their company websites. The company that develops the website may register the domain under their company name, rather than the name of the requesting company. For example, the company Michigan Capital Finance has a home page associated with the domain name michcap.com. If this domain name is matched to a domain registration list (using, for example, http://www.whois.net), the named registrant is an entity ZWBALLCO, which is a dif-

ferent company that offers website hosting services to other companies. In other cases, registrants may choose to be listed as anonymous, making any decision on the company homepage impossible. Hence, we require a more reliable procedure. In this paper, we develop a machine-learning approach to discovering the home page for a specified company, and compare results with the non-learning approach of accepting the first return from an Internet search engine. The following section provides an overview of the problem and our approach, introducing aspects of the modeling which we cover in depth in subsequent sections.

2.

PROBLEM SPECIFICATION

As noted in the Introduction, our objective is to determine the correct home page for a company specified only by its name. Figure 1 shows a high-level overview of the process to determine the URL given a Company Name. We refer to this overall objective as the Mapping Task, where a specific URL is mapped onto a Company Name. This task can be formulated as a supervised, machine-learning problem by generating a set of potential home page (URL) matches (or candidates) for each company, and then training a binary classification model against a labeled set of such examples. Such a model estimates the probability that a specific URL represents the home page of a given company. We refer to the process of scoring each individual candidate for a company name (a Company Name and URL match) as the Match-Evaluation Task, or simply as the Matching Task. The Mapping Task then aggregates these individual results to obtain a final recommendation for the specified company name. The generation of candidate URLs is accomplished by submitting the company name to an Internet search engine, and retaining a small number of unique domains returned by such a query as candidate URLs for this company name. Explanatory features are obtained via analysis of company name and the HTML tags and content on the candidate URL. Using these features, each candidate match is scored by a model (denoted as the Correct-Website Model ) trained to estimate the probability that a given candidate match is correct. The best match is returned as the final result of the mapping task. If none of the candidate matches meets a specified confidence, the company is assumed to have no website. Figure 2 illustrates the training of the Correct-Website Model. Note that this model accepts output from a second model designed to estimate the probability that a given URL is the home page of any company. We refer to this as the Company Website Model – it is also implemented as a binary classifier. The objective of each model can be made clear via the different training sets used in each. Given a Company i and candidate URL j, we denote a candidate match as . For the Correct-Website Model, we label each such match as Correct or Incorrect based on visual inspection of the website under the following definition: A candidate match is a Correct example if it appears likely that this site was created by the named entity, either directly, or indirectly via a third-party website de-

Figure 1: Overview of the Mapping Task veloper. The match is also considered Correct if it is clear that the company name is a subsidiary or recent acquisition of the parent company identified by the U RL. The training set for the Company Website Model is constructed by manually labeling a set U RLj as Positive or Negative based on whether the site appears to be the home page for an actual company. This can be somewhat subjective, and we use the following guidance in generating these labels: A U RLj is a Positive example if it clearly belongs to a business entity that produces some kind of goods or services, and the site appears to have the goal to promote some aspect of their business model. We also label not-for-profit organizations like foundations and hospitals as positive examples for this model. Labeling positives here can be subjective, particularly for sites which are essentially web portals. For example, small real-estate companies may serve more as aggregation sites for other similar companies, and we do not include such sites as positive examples. The dominant source of negative examples for this model are the large number of sites which provide directory information for small companies (e.g. http://www.yellowpages.com). Such sites are often returned by search engines for cases where the company in question does not have a website, and it is important that our model learn to differentiate these examples. The following sections elaborate on the specifics of the

service, described later in the paper. This process results in the following training sets: • For 103 companies within the list of 1087, we label the set of unique U RLj to form the training set for the Company Website model, labeled as positive or negative using the criteria given in Section 2. This yields a labeled set of 628 examples, 176 of which are positive. • The set of unique pairs form the training set for the Correct Website model, labeled as positive or negative using the criteria given in Section 2. This yields a labeled set of 8551 examples, 802 of which are positive. Note that some companies have more than one positive result due to the use of multiple domain names or having subsidiary-specific home pages as well. Figure 2: Training the Matching Model models as outlined in Figures 1 and 2. Section 3 discusses the details of the two training sets, Section 4 describes the generation of features, Section 5 describes the training of the two models introduced here, and Section 6 summarizes the performance of the models.

3.

DATA SETS

In this section, we describe the approach used to obtain the labeled data sets mentioned in the previous section. We choose a set of company names listed by Dun & Bradstreet (http://www.dnb.com), which maintains information on over 15 million companies worldwide. We restricted ourselves to companies in the US with an annual revenue greater than $3 million, which yields a set of approximately 375,000 companies. This set is then sampled uniformly to yield 1087 companies. For each company name in this base list, we need to generate a set of potential URL matches or candidates. We build the list of candidates by submitting the company name to Google. Before submitting each company name, any legal identifiers (e.g. “Inc” or “LLC”) are stripped from the company name, and the remaining text is encapsulated in quotation marks to ensure that sites with the full company name appear in the search results. Depending on the uniqueness of the search terms, the number of results for each submitted company name can vary from zero (in a small number of cases) to hundreds of thousands. For our analysis, we retain only the top ten URLs as ordered by Google. Since the goal of this task is creating candidate home-page matches for each company, the resulting sites are stripped of any subdirectories to focus specifically on the domain, e.g., “http://www.ibm.com/sandbox/homepage/” becomes “www.ibm.com”, and so on. Each unique domain name here becomes a candidate match for the specified company. Note that due to the limited numbers of search results, some companies will have fewer than ten candidates. As a result of this process, we have a set of potential matches. We use this list to generate the set of training examples through labeling by the authors, as well as using the “Amazon Mechanical Turk”

To aid in labeling, matches were submitted to the Amazon Mechanical Turk service (http://www.mturk.com/). This service allows one to submit a simple question, called a “Human Intelligence Task” (HIT), and have an individual (or “Worker”) provide an answer to each HIT in exchange for a nominal reward. For our labeling process, 9703 pairs were submitted as HITs, representing a total of 984 unique companies. To ensure a high quality of labeling, each pair was labeled as correct or incorrect by three different Workers. Validation by the authors showed that mistaken labeling tends to occur through false positives (i.e. a match is incorrectly labeled as correct) rather than false negatives. To deal with this, the 984 companies were filtered to include only the companies for which the authors are confident in their labels. A candidate match was accepted as correct only if all 3 Workers labeled it as correct. A Company Name was assumed to have no URL if none of the 10 candidate matches for this company received more that 1 correct label from the 3 Workers. If neither of these conditions were met for a specific Company Name, then the result was viewed as uncertain, and the company was discarded from further analysis. In this way, the number of companies labeled using the Mechanical Turk service was reduced from 984 to 844 companies. Combined with the examples labeled by authors, this resulted in a data set consisting of 947 companies, and 8551 pairs.

4. FEATURE CONSTRUCTION In this section, we discuss the construction and analysis of the explanatory features used to build the classification models. We separate these features into two sets: • structured features that reflect analysis of a website’s HTML tags and body, as well as other structural properties of the page such as the link structure; and, • unstructured text features extracted from the raw content that appears on the web page. The structured features are selected with the primary objective of providing insight into whether a specified candidate is a correct match. The unstructured features are included for the primary purpose of differentiating company websites from non-company websites as required in the Company Website model.

4.1 Structured features Given a candidate match, the structured features capture the extent to which the company name is present in either the HTML tags or the body of the page at this URL. The structured features also capture other non-content-based attributes of the page, such as the number of ads on the page. Table 1 summarizes all structural features, sorted by information gain [7]. The general structure of a web page includes header information, encapsulated in a tag, followed by the actual web page itself (as shown in a web browser) surrounded by a tag. The title of the page itself is contained within a separate tag embedded in the header. Many of the features (e.g. Features 1, 2, 6, 13) in Table 1 capture analysis of the <title> tag, effectively measuring the extent to which the company name is reflected in the title. The Levenshtein distance [8] is one such measure of edit distance, effectively measuring the number of operations required to transform one string into another. We also examine descriptors in the <META> tags (Features 14, 16); however, the information gain associated with these features is relatively low, perhaps because many sites do not include such tags. Another strategy we employ is to examine the hyperlink information and the link structure of a web page. Examples of the resulting features include the number of links to domains known to host ads (Feature 20), the number of links to other websites (Features 10, 22), and the ratio of number of words to the number of hyperlinks (Features 11, 15, 18). These features are intended to capture the observation that directories or portals often contain a large number of links and very little content, and hence can be excluded as potential matches for a specific company. However, aside from counting ad-related domains, it appears that the hyperlink structure provides little useful information for the model, as measured by the information gain. If external information about a candidate website is available, this can prove extremely valuable in feature creation as well; where, by external we refer to features that cannot be extracted from the content of the website itself. Since candidate matches include a URL obtained by submitting the company name to Google, we have both the domain name of the URL and the rank of this URL in the ordered list of sites returned by the Google search. This rank (Feature 4) is used within the baseline classifier, but is also one of the highest-ranked features for whether or not a company page is the correct one (see Table 1). The domain name is another external feature that is analyzed by taking the Levenshtein Distance [8] between the company name and domain (Feature 7). As discussed below, the list of features in Table 1 provide an accurate approach to discerning a correct URL match from a list of candidates. We note that it may be possible to improve the accuracy by crawling more than just the company home page. New features can undoubtedly be created for the home page, and it may also be beneficial to look at pages that are linked to the home page, or even the entire company site.<br /> <br /> 4.2 Analysis of structured features To analyze the effects of the various structured features, a Logistic Regression model was created using the features in Table 1. The training set for this experiment was the 8551<br /> <br /> labeled <CompanyN ame, U RL> candidates described in Section 3, which contains 802 positive examples. We used 10-fold cross-validation and computed area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, as the performance metric. For the task of URL mapping, we are most concerned with getting an accurate ranking of potential matches, based on the estimated class probability estimates. As such, our evaluation is focused on ROC curve analysis, as opposed to predictive accuracy. Also, given the high imbalance between positive and negative matches in the data, classification accuracy, by itself, is not the desired metric we want to maximize – simply classifying all examples as negative achieves an accuracy of 90.6%, without being of any value to the mapping task. To optimize the matching model, various sets of features were removed to observe their effects on the model’s AUC and classification accuracy. In each case, a Logistic Regression model was built combining all remaining features. The different sets of features used for this ablation study are listed below: • “All Features” model incorporates all features described in Table 1, • “Ad data” refers to features that count advertisements or links thereto, • “Link structure” focuses on hyperlink counting or using word and link ratios, • “<title> features” include any features that use the text in the <title> tag, • “Doc structure” features include any feature that incorporates HTML structure into its computation, such as counting words in <title> or <META> tags, • “External features” (Features 4 and 7) are those that do not come from the page content itself, such as the domain name or Google search rank. This analysis can provide useful information about the types of structured features that contribute predictive insight into classifying <CompanyN ame, U RL> candidates. Table 2 summarizes the results in terms of AUC and accuracy from these experiments. Based on these experiments, it is clear that all features contribute to the model. By combining information on the web page’s link structure, advertising practices, and information on whether the specified company is mentioned at various parts of the site, the Logistic Regression model is provided with a general overview of the candidate site and can make a decision based on some of the features a human labeler would qualitatively explore. What is encouraging is that no specific group of features appears to play an especially significant role in boosting the models’ AUC scores, implying that multiple features are used by the model in estimating the probability that a <CompanyN ame, U RL> match is correct. The “No external features” model in Table 2 is noteworthy because it is built solely on content-based structural features of the page itself, without using insight gained via use of a search engine. We observe that this model performs better than the Logistic Regression model that uses only external features (Features 4 and 7) – the AUCs are 0.939 and 0.833, respectively. Though the external features rank high<br /> <br /> # 1<br /> <br /> Table 1: Summary of Structured Features Description Count of the number of words in a company name appearing in the HTML <title> tag, normalized by length of company name Count of the number of words in a company name appearing in the HTML <title> tag Count of the number of words in a company name appearing in the HTML, normalized by length of company name The rank of the page as returned by the initial submission to Google<br /> <br /> Type Numeric<br /> <br /> Info Gain 0.16002<br /> <br /> Numeric<br /> <br /> 0.15442<br /> <br /> Numeric<br /> <br /> 0.13367<br /> <br /> Numeric<br /> <br /> 0.11919<br /> <br /> Numeric<br /> <br /> 0.10512<br /> <br /> Numeric<br /> <br /> 0.08166<br /> <br /> Numeric<br /> <br /> 0.06065<br /> <br /> 8<br /> <br /> Count of the number of words in a company name appearing in the HTML Levenshtein distance [8] of the content in the HTML <title> tag to the company name Levenshtein distance of the website’s domain name to the company name Does the entire company name appear somewhere in the HTML?<br /> <br /> Boolean<br /> <br /> 0.05998<br /> <br /> 9<br /> <br /> The length of the HTML source code<br /> <br /> Numeric<br /> <br /> 0.05119<br /> <br /> 10<br /> <br /> Count of the number of non-local links appearing on the page (i.e. hyperlinks to other websites) Ratio of words in the site content to all links on the page<br /> <br /> Numeric<br /> <br /> 0.04069<br /> <br /> Boolean<br /> <br /> 0.04001<br /> <br /> Count of the number of ads and ad-related key words on a site, as defined by links to ad-serving domain names Is the company name in the HTML <title> tag, without splitting the name into word tokens Count of the number of words in a company name appearing in the HTML <META> description, normalized by length of company name Ratio of words in the site content to all links on the page<br /> <br /> Numeric<br /> <br /> 0.03983<br /> <br /> Boolean<br /> <br /> 0.03876<br /> <br /> Numeric<br /> <br /> 0.03156<br /> <br /> Numeric<br /> <br /> 0.02656<br /> <br /> Numeric<br /> <br /> 0.02496<br /> <br /> 17<br /> <br /> Count of the number of words in a company name appearing in the HTML <META> description Does the website contain at least one link to an ad-serving domain<br /> <br /> Boolean<br /> <br /> 0.02483<br /> <br /> 18<br /> <br /> Ratio of words to non-local hyperlinks on the page<br /> <br /> Numeric<br /> <br /> 0.01791<br /> <br /> 19<br /> <br /> Ratio of words in the site content to all local links on the page<br /> <br /> Numeric<br /> <br /> 0.01779<br /> <br /> 20<br /> <br /> Count of the number of hyperlinks to ad-serving domains in the HTML code Do hyperlinks to ad-serving domains appear in the HTML code?<br /> <br /> Numeric<br /> <br /> 0.01569<br /> <br /> Boolean<br /> <br /> 0.01363<br /> <br /> The number of non-local hyperlinks on the page, normalized by the total number of links Does the company’s legal descriptor (e.g. “inc”, “llc”) appear in the HTML code of the site?<br /> <br /> Numeric<br /> <br /> 0.00757<br /> <br /> Boolean<br /> <br /> 0<br /> <br /> 2 3 4 5 6 7<br /> <br /> 11 12 13 14 15 16<br /> <br /> 21 22 23<br /> <br /> 1.0 0.8 0.6<br /> <br /> Apart from the structured features described in the previous section, we can also exploit the unstructured text content that appears on each candidate website. The web content by itself can be quite useful in determining if a candidate web page is a valid company site, as opposed to the website of a person, directory, web log, etc. For example, words like advertisement, yellow, directories, citysearch appear on web pages of directory services that might list company names, but are not the home page of the desired company. On the other hand, words such as products, services, contact are usually more commonly found on actual company websites. If a candidate URL is not the homepage of any company, then we can be sure that it is not a correct match for the specific company in question. One could build a list of keywords by hand to help determine if a page is likely to be a company page or not. However, we find it more efficient and accurate to train models to automatically learn this distinction. We describe this process in more detail below. For each unique candidate URL in our data, we download the corresponding web page and pre-process the text by removing stop words, stemming the words into inflected<br /> <br /> 0.4<br /> <br /> 4.3 Unstructured text features<br /> <br /> Given the data described above, we can now build textbased models to predict if a given URL is the web page of any company. We use the Company Website labels of the 103 company subset described in Section 3. Since this is a smaller data set, less stringent pruning is applied to the text data, with words discarded if they appear on less than five web pages. We then train a na¨ıve Bayes classifier using a multinomial text model [9]. As noted by Rennie et al. [11] and Frank and Bouckaert [4] na¨ıve Bayes trained on imbalanced data produces predictions that are biased in favor of large classes. To overcome this, we re-weighted the instances in the training data so that a positive instance has 3.5 times the weight of a negative instance (which corresponds to the imbalance in this data set). We evaluated this classifier using 10-fold cross-validation and present the resulting ROC curve in Figure 3. The results demonstrate that our model is quite effective in separating company pages from noncompany pages, based solely on the text that appears on the page — producing a model AUC of 0.809. Since we are primarily interested in finding correct URL matches for the company names in our data, we can further refine our Company Website model to only focus on the distribution of companies in our dataset. We do this by training our text classification models using only the company websites of correct matches in our dataset as positive examples of companies (as opposed to all company websites). This focused Company Website model performs even better, resulting in an AUC of 0.833 for the ROC curve shown in Figure 4. These results confirm that the raw text content is indeed quite a useful source of information for building Company Website models, which is turn can be used to drive accurate Correct Website models.<br /> <br /> 0.2<br /> <br /> in information gain, it is clear that they are insufficient for building a good Correct Website model. Further evidence of the inadequacy of external features, specifically search engine ranks, can be seen in Table 3. This table shows how many company home pages in our data set were ranked first, second, third, and so on by the search engine. Thus, 474 out of 575 companies (82.43%) with home pages had their websites returned as the highest ranking search result, but 17.57% did not. Combined with the knowledge that 39% of all companies have no clear web presence, this shows that simply trusting a search engine’s highest-ranked result is a sub-optimal strategy.<br /> <br /> 4.4 Analyzing text features<br /> <br /> 0.0<br /> <br /> Table 3: Summary of Ranks of Company Home Pages Rank # of Companies % of Companies 1 474 82.43 2 37 6.43 3 21 3.65 4 13 2.26 5 9 1.57 6 7 1.22 7 5 0.87 8 3 0.52 9 2 0.35 10 4 0.70<br /> <br /> forms, and filtering out words that appear in less than 25 web pages. This results in a collection of 5777 web pages, represented by 4818 unique words, which we convert into vectors using the bag-of-word representation with TF-IDF term weighting [2]. Note that there are less web pages than total <CompanyN ame, U RL> pairs because some URLs, like those of business directories, may be candidates for multiple companies.<br /> <br /> True Positive Rate<br /> <br /> Table 2: Comparing different sets of structured features. Feature Set AUC Accuracy All features. 0.953 93.9 No ad data 0.950 94.0 No link structure 0.952 94.0 No <title> features 0.937 93.3 No doc. structure features 0.935 93.2 No external features 0.939 92.6 Only external features 0.833 90.6<br /> <br /> 0.0<br /> <br /> 0.2<br /> <br /> 0.4<br /> <br /> 0.6<br /> <br /> 0.8<br /> <br /> 1.0<br /> <br /> False Positive Rate Figure 3: ROC curve for text-classification model trained to identify company websites.<br /> <br /> 1.0 0.2<br /> <br /> In this section we discuss our results on URL matching, and the related task of mapping company names to home page URLs.<br /> <br /> 0.8 0.6 0.4<br /> <br /> 6. EXPERIMENTAL EVALUATION<br /> <br /> 0.0<br /> <br /> True Positive Rate<br /> <br /> tured features using a na¨ıve Bayes classifier and Logistic Regression respectively. However, instead of simply averaging the output predictions, we train another Logistic Regression model to learn how to optimally combine the outputs of the base models — following Wolpert’s [12] approach to Stacked Generalization.<br /> <br /> 0.0<br /> <br /> 0.2<br /> <br /> 0.4<br /> <br /> 0.6<br /> <br /> 0.8<br /> <br /> 1.0<br /> <br /> False Positive Rate Figure 4: ROC curve for a focused Company Website model trained to identify websites for companies drawn from the distribution of correct matches.<br /> <br /> 5.<br /> <br /> THE URL-MATCHING MODELS<br /> <br /> In Section 4.1 we describe the generation of structured features that can be used to identify <CompanyN ame, U RL> matches; and in Section 4.3 we describe the use of unstructured text content for building Company Website models. In this section, we describe different approaches to combining the structured and unstructured features to build more accurate Correct Website models. We compare the following three approaches of incorporating the Company Website model into a Correct Website model. Voting: In this approach, we train a separate na¨ıve Bayes Company Website model and a Logistic Regression Correct Website model (on the same training instances), and then average the class probability estimates produced by both models to get a revised estimate of a correct match. Nesting: In this approach, we use the output of the Company Website model as an input to the Logistic Regression model. Specifically, we add another variable in the Logistic Regression, corresponding to the predicted probability that the candidate website is a company website as given by a Company Website model. In order not to bias our evaluation, we build the Company Website model using the same training set as used by the Logistic Regression model. However, in order to train the Logistic Regression on the additional Company Website score, we need to provide values for this variable on the training data. We could do this by training a Company Website model on the training set and providing the scores on the same training data. However, the Logistic Regression trained on this input could be prone to over-fitting. Hence, a better approach is to use crossvalidation on the training set to get unbiased scores from a Company Website model, i.e., the training set is further split into 10 folds, and the instances in each fold are scored by a Company Website model that has been trained on the remaining 9 folds. These unbiased estimates are then used as inputs to the Logistic Regression along with all the other structured features. In experiments, not presented here, we have confirmed that this unbiased estimate approach does in fact improve on using training set scores. Stacking: This approach is similar to voting, where two separate models are trained on the structured and unstruc-<br /> <br /> 6.1 Evaluating the matching task We evaluated the three Correct Website models described in Section 5 on our set of 8551 <CompanyN ame, U RL> pairs. Since candidate URLs are retrieved through a search engine, a natural approach to URL selection is to simply pick the highest ranked result. We refer to this baseline approach as the “I’m Feeling Lucky” (IFL) classifier — which classifies all first search returns as a correct match, and all other candidate URLs are assumed to be incorrect matches. In addition to this baseline, we also present results for the Correct Website models built using only the structured features described in Section 4.1. All experiments were performed using 10-fold cross-validation, and the AUC and accuracy are reported for all five classifiers in Table 4. The corresponding ROC curves are shown in Figure 5. For clarity of the figure, we only plot the three most relevant curves. Note that the IFL classifier only outputs predicted class labels, and does not provide class membership probabilities. As such, it is not meaningful to compute an AUC for this classifier, and hence this field is left blank in Table 4. For the same reason, Figure 5 only presents the single operating point of the IFL classifier. In lieu of an ROC curve, we present the confusion matrix for the IFL classifier in Table 5. Of the 947 companies in the data set, 575 have home pages whose domains appear in the search results. Of these 575, 474 have their home pages as the top rank in the set of results. Thus, it is clear that the top ranking sites are often the correct ones, if one assumes a home page exists. However, only 60.7% of company names yield a site, making the IFL classifier generally inaccurate. IFL classification yields a matching accuracy of 90.6%; which may seem high, but in fact is equal to the base rate, i.e. classifying all examples as negative. Compared to our baseline IFL, all the modeling-based approaches perform substantially better. This is evidenced by the fact that even the single operating point of IFL lies below the ROC curves of our matching models in Figure 5. The results also confirm that incorporating the unstructured text features via the Company Website model can significantly improve on using only structured features. Amongst the three models that use both structured and unstructured features, Stacking shows the best performance in terms of AUC. Clearly, learning how to combine the outputs of the Company Website model with the Logistic Regression model on structured features, as done by Stacking, is a very effective approach to solving the URL-matching problem. As such, we use this approach for the evaluation in the following section.<br /> <br /> 1.0 0.8 0.6 0.4 0.2<br /> <br /> True Positive Rate<br /> <br /> o<br /> <br /> 0.0<br /> <br /> o 0.0<br /> <br /> 0.2<br /> <br /> 0.4<br /> <br /> 0.6<br /> <br /> 0.8<br /> <br /> Stacking Voting Nesting IFL 1.0<br /> <br /> False Positive Rate Figure 5: ROC curves for the three URL-matching models compared to the baseline “I’m Feeling Lucky” performance.<br /> <br /> Table 4: Comparison of different URL-matching classifiers. Model Type AUC Accuracy IFL 90.6% Only Structured 0.953 93.9% Voting 0.952 93.5% Nesting 0.954 94.1% Stacking 0.957 93.9%<br /> <br /> ing Lucky” classifier, are presented in Table 6. In addition to overall accuracies, we also report results separately for companies that have a valid website, and those that do not. For companies without a website, two outcomes are possible: classifying all candidates as incorrect, or incorrectly identifying one candidate as the company home page. In the case of the IFL classifier, the highest ranked result is always considered to be the correct one, and so it incorrectly identifies home pages for all companies without sites. Our model, on the other hand, is able to identify 298 (80.1%) and 370 (99.5%) companies as not having home pages, for T = 0.17 and T = 0.90 respectively. For the 575 companies that do have a home page, there are three possible outcomes: (1) the correct home page is mapped to the company, (2) the company is incorrectly classified as not having a home page, and (3) an incorrect home page is mapped to the company name. For this specific subset of companies, the IFL classifier does best, mapping 474 (82.4%) companies to their home pages, while our model does so for 445 (77.4%) and 129 (22.4%) companies at T = 0.17 and T = 0.90 respectively. However, when the IFL classifier does not predict the correct homepage, it maps the company to an incorrect page. Our model tends to label companies as not having sites rather than providing an incorrect home page when it is uncertain of the right match. This can be seen in Table 6, where our models for T = 0.17 and T = 0.90 provide an incorrect home page for only 83 (14.4%) and 15 (2.6%) of companies respectively, which is less than the IFL classifier. While our model (at the specified thresholds) has higher levels of predicting websites as missing, it is arguable from a data-quality standpoint, that this is much better than having incorrect websites. 100<br /> <br /> 6.2 Evaluating the mapping task<br /> <br /> 60 40 20 0<br /> <br /> Accuracy (%)<br /> <br /> 80<br /> <br /> The experimental results thus far show that evaluating a <CompanyN ame, U RL> match can be accomplished with an AUC of over 0.95. However, as stated in Section 2, this matching problem is a sub-task of the real problem of URLmapping — i.e., given a company name, map it to the appropriate home page URL. Given a Correct Website classifier, the mapping task can be solved in the following way. For a specific company name, a set of N (in our case, N = 10) candidate URLs is obtained from a search engine, All Sites resulting in N <CompanyN ame, U RL> pairs. Each pair is With Valid Sites then passed on to the Correct Website model, which returns Without Valid Sites a probability representing the likelihood that the pair is a IFL correct match. If at least one probability score is above a specified threshold value, T , then the highest-ranked pair is 0.0 0.2 0.4 0.6 0.8 1.0 considered to be the correct match, and the company name Threshold (T) is then mapped to that URL. Otherwise, we assume that all candidate URLs are incorrect and that the company does Figure 6: Analysis of results for differing threshold not have a home page. values. The threshold T can be selected based on the relative costs or penalties of having false-positive versus false-negative matches. As discussed earlier, the number of sites identified as missWe consider two such cases, where T = 0.17 and T = ing can be adjusted by changing the threshold T . This is 0.90. Both results, along with the results of the “I’m Feelillustrated in Figure 6, where the accuracy of our model is mapped relative to T values. The accuracies in mapping companies with valid sites to correct URLs, and companies Table 5: Confusion matrix for the IFL classifier. without valid sites to null responses, is also shown. As T inPredicted creases, the accuracy for companies without sites increases A Match No Match as well, and the accuracy falls for companies with existent A Match 474 328 home pages. Thus, one can choose a specific T value deActual No Match 473 7276 pending on whether one is interested in the most accurate<br /> <br /> Table 6: Comparison of the IFL classifier versus model-driven URL mapping using two different thresholds. Company Type # Companies Predicted Class IFL T = 0.17 T = 0.90 Correct 474 445 129 With Valid Site 575 Incorrect (Missing) 0 47 431 Incorrect (Wrong Site) 101 83 15 Correct 0 298 370 Without Valid Site 372 Incorrect (Wrong Site) 372 74 2 All 947 Accuracy 50.1% 78.5% 71.8%<br /> <br /> results or the least number of incorrect mappings. The overall accuracies of our mapping algorithm are quite high relative to the IFL baseline, which has an accuracy of only 50.1% for all 947 companies compared to our models’ 78.5% and 71.8%, respectively for thresholds of 0.17 and 0.90. Clearly, selecting the first return of a search engine is not a viable solution for automating the process of discovering company home pages. In contrast, our model-driven approach can perform over 56% better than this baseline, and provides the added ability of being able to reject all candidate URLs for a company that has no website. The IFL classifier can only do this in the trivial case, when a search engine returns zero results.<br /> <br /> 7.<br /> <br /> RELATED WORK<br /> <br /> Kolari et al. [5] build a classifier to detect if a given web page is a webblog (blog). They do this with the aim of differentiating authentic blogs from spam blogs. However, unlike in our work, they do to attempt to find the web page that corresponds to the desired blog for which a user is searching. Yu et al. [13] learn to classify web-pages into three general classes of user-interest – personal homepages, college admission pages, and resume pages. The goal of their work is to learn only using positively labeled examples and unlabeled data. Kruger et al. [6] also propose using text categorization to identify general classes of pages, with the aim of building niche search engines that execute queries on a specific part of the web. In particular they use SVM classifiers to identify if a web page is a “Call for Papers”. Building a text classifier to identify if a web page is a company website is similar to these efforts; however, none of these previous studies attempt to find a web page of a particular class that belongs to a specified named entity. Attardi et al.[1] propose categorizing web pages, by not only using the content of the page, but also by using the context in which a URL referring to it appears. This may be an effective approach to increase the accuracy of our models; however, it requires crawling the entire web, which does not seem necessary given the high accuracy of our current method. Cohen [3] also explores using external information (i.e. industry) to assign documents and web pages probabilities that they represent a specific company’s home page. This helps provide a useful starting point for other features in our model, such as incorporating the company’s address, industry, revenue, or other firmographic information when exploring a potential home page.<br /> <br /> 8.<br /> <br /> CONCLUSION<br /> <br /> Determining the home page for a company identified only by its name is not a trivial task because many small companies have a limited web presence. Based on our research,<br /> <br /> almost 40% of small companies do not have a website that appears in the top ten search engine results. In this paper, we describe how the task of mapping a company name to a home page can be solved through a sub-task of evaluating if a given candidate URL matches the company name. We formulate this sub-task of evaluating matches as a supervised learning problem, and solve it by training classifiers on features that analyze website structure as well as text content. Using such a classifier based on a Stacked learner, we are able to achieve an accuracy of 78.5% in mapping a correct URL to a company name, with only 16.6% of company names mapped to an incorrect search result. A reasonable baseline method is to simply take the top-ranked search result as the company home page – such an approach yields an accuracy of only 50.1%. Our approach has the added benefit of being able to accurately predict that there are no reliable candidate URLs for a company name. While the task described in this paper is targeted at identifying home pages of companies, the methodology developed lends itself to abstraction: one can use an analogous approach for discovering home pages of people, institutions, or any other named entity.<br /> <br /> 9. ACKNOWLEDGMENTS We acknowledge Ildar Khabibrakhmanov for his contributions during the initial phase of this work.<br /> <br /> 10. REFERENCES [1] G. Attardi, A. Gulli, and F. Sebastiani. Automatic Web Page Categorization by Link and Context Analysis. Proceedings of THAI, 99:105–119, 1999. [2] C. Buckley, G. Salton, and J. Allan. The effect of adding relevance information in a relevance feedback environment. In Proceedings of the seventeenth annual international ACM-SIGIR conference on research and development in information retrieval. Springer-Verlag, 1994. [3] W. Cohen. Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems (TOIS), 18(3):288–321, 2000. [4] E. Frank and R. R. Bouckaert. Naive bayes for text classification with unbalanced classes. In Proc 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 503–510, 2006. [5] P. Kolari, T. Finin, and A. Joshi. SVMs for the Blogosphere: Blog Identification and Splog Detection. AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.<br /> <br /> [6] A. Kruger, C. L. Giles, F. Coetzee, E. Glover, G. Flake, S. Lawrence, and C. Omlin. DEADLINER: Building a new niche search engine. In Ninth International Conference on Information and Knowledge Management, CIKM 2000, pages 272–281, Washington, DC, November 6–11 2000. [7] S. Kullback and R. Leibler. On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79–86, 1951. [8] V. Levenshtein. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, 10:707, 1966. [9] A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classification. In Papers from the AAAI-98 Workshop on Text Categorization, pages 41–48, Madison, WI, July 1998. [10] P. Melville, Y. Liu, R. Lawrence, I. Khabibrakhmanov, C. Pendus, and T. Bowden. Finding new customers using structured and unstructured data. In Workshop on Mining Multiple Information Sources, The Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, August 12-15 2007. [11] J. Rennie, L. Shih, J. Teevan, and D. Karger. Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), pages 616–623, 2003. [12] D. H. Wolpert. Stacked generalization. Neural Networks, 5:241–259, 1992. [13] H. Yu, J. Han, and K. Chang. PEBL: positive example based learning for Web page classification using SVM. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 239–248, 2002.<br /> <br /> </div> </div> </div> </div> </div> </div> <div class="row hidden-xs"> <div class="col-md-12"> <h4></h4> <hr /> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-machine-learning-approach-to-discovering-semantic-scholar_59bd69cc1723dd99e8793fbf.html"> <img src="https://p.pdfkul.com/img/300x300/a-machine-learning-approach-to-discovering-semanti_59bd69cc1723dd99e8793fbf.jpg" alt="A Machine-Learning Approach to Discovering ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A Machine-Learning Approach to Discovering ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-bidirectional-transformation-approach-towards-semantic-scholar_5a9a47321723dd6e29fa7adb.html"> <img src="https://p.pdfkul.com/img/300x300/a-bidirectional-transformation-approach-towards-se_5a9a47321723dd6e29fa7adb.jpg" alt="A Bidirectional Transformation Approach towards ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A Bidirectional Transformation Approach towards ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-bidirectional-transformation-approach-towards-semantic-scholar_5a632e221723dd66690e46da.html"> <img src="https://p.pdfkul.com/img/300x300/a-bidirectional-transformation-approach-towards-se_5a632e221723dd66690e46da.jpg" alt="A Bidirectional Transformation Approach towards ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A Bidirectional Transformation Approach towards ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/the-subjective-approach-to-ambiguity-a-critical-semantic-scholar_5a1a0bc71723dd850400c9f0.html"> <img src="https://p.pdfkul.com/img/300x300/the-subjective-approach-to-ambiguity-a-critical-se_5a1a0bc71723dd850400c9f0.jpg" alt="The Subjective Approach to Ambiguity: A Critical ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">The Subjective Approach to Ambiguity: A Critical ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-game-theoretic-approach-to-apprenticeship-semantic-scholar_5adfb5687f8b9acf578b457e.html"> <img src="https://p.pdfkul.com/img/300x300/a-game-theoretic-approach-to-apprenticeship-semant_5adfb5687f8b9acf578b457e.jpg" alt="A Game-Theoretic Approach to Apprenticeship ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A Game-Theoretic Approach to Apprenticeship ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-machine-learning-approach-to-automatic-music-semantic-scholar_59bb5a0a1723dde2a958cbf5.html"> <img src="https://p.pdfkul.com/img/300x300/a-machine-learning-approach-to-automatic-music-sem_59bb5a0a1723dde2a958cbf5.jpg" alt="A Machine Learning Approach to Automatic Music ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A Machine Learning Approach to Automatic Music ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/the-inductrack-a-simpler-approach-to-magnetic-semantic-scholar_5acb9d517f8b9a0c148b4576.html"> <img src="https://p.pdfkul.com/img/300x300/the-inductrack-a-simpler-approach-to-magnetic-sema_5acb9d517f8b9a0c148b4576.jpg" alt="The Inductrack: A Simpler Approach to Magnetic ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">The Inductrack: A Simpler Approach to Magnetic ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-game-theoretic-approach-to-apprenticeship-semantic-scholar_5b37ece1097c47e3718b4569.html"> <img src="https://p.pdfkul.com/img/300x300/a-game-theoretic-approach-to-apprenticeship-semant_5b37ece1097c47e3718b4569.jpg" alt="A Game-Theoretic Approach to Apprenticeship ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A Game-Theoretic Approach to Apprenticeship ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-new-approach-to-linear-filtering-and-prediction-semantic-scholar_5a34c7e91723dd5c9a279ff9.html"> <img src="https://p.pdfkul.com/img/300x300/a-new-approach-to-linear-filtering-and-prediction-_5a34c7e91723dd5c9a279ff9.jpg" alt="A New Approach to Linear Filtering and Prediction ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A New Approach to Linear Filtering and Prediction ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-reuse-based-approach-to-determining-security-semantic-scholar_59f4a99a1723dde03a60ab33.html"> <img src="https://p.pdfkul.com/img/300x300/a-reuse-based-approach-to-determining-security-sem_59f4a99a1723dde03a60ab33.jpg" alt="A Reuse-Based Approach to Determining Security ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A Reuse-Based Approach to Determining Security ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-uniform-approach-to-inter-model-transformations-semantic-scholar_5a0daa451723dd2c09600379.html"> <img src="https://p.pdfkul.com/img/300x300/a-uniform-approach-to-inter-model-transformations-_5a0daa451723dd2c09600379.jpg" alt="A Uniform Approach to Inter-Model Transformations - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A Uniform Approach to Inter-Model Transformations - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/the-subjective-approach-to-ambiguity-a-critical-semantic-scholar_5a156b0c1723ddd4792c668f.html"> <img src="https://p.pdfkul.com/img/300x300/the-subjective-approach-to-ambiguity-a-critical-se_5a156b0c1723ddd4792c668f.jpg" alt="The Subjective Approach to Ambiguity: A Critical ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">The Subjective Approach to Ambiguity: A Critical ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/listwise-approach-to-learning-to-rank-theory-semantic-scholar_59bc8f891723dd98e8338885.html"> <img src="https://p.pdfkul.com/img/300x300/listwise-approach-to-learning-to-rank-theory-seman_59bc8f891723dd98e8338885.jpg" alt="Listwise Approach to Learning to Rank - Theory ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">Listwise Approach to Learning to Rank - Theory ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/an-ontology-driven-approach-to-support-wireless-semantic-scholar_5b0043398ead0e3a468b456a.html"> <img src="https://p.pdfkul.com/img/300x300/an-ontology-driven-approach-to-support-wireless-se_5b0043398ead0e3a468b456a.jpg" alt="An Ontology-driven Approach to support Wireless ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">An Ontology-driven Approach to support Wireless ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/an-approach-to-lossy-image-compression-using-1-semantic-scholar_5ad6cc057f8b9a15358b4571.html"> <img src="https://p.pdfkul.com/img/300x300/an-approach-to-lossy-image-compression-using-1-sem_5ad6cc057f8b9a15358b4571.jpg" alt="an approach to lossy image compression using 1 ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">an approach to lossy image compression using 1 ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/bayesian-approach-to-derivative-pricing-and-semantic-scholar_59d037db1723dd3b0945a3e9.html"> <img src="https://p.pdfkul.com/img/300x300/bayesian-approach-to-derivative-pricing-and-semant_59d037db1723dd3b0945a3e9.jpg" alt="Bayesian Approach To Derivative Pricing And ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">Bayesian Approach To Derivative Pricing And ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/an-approach-to-lossy-image-compression-using-1-semantic-scholar_5b074cbb8ead0ed3178b4573.html"> <img src="https://p.pdfkul.com/img/300x300/an-approach-to-lossy-image-compression-using-1-sem_5b074cbb8ead0ed3178b4573.jpg" alt="an approach to lossy image compression using 1 ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">an approach to lossy image compression using 1 ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/bayesian-approach-to-derivative-pricing-and-semantic-scholar_59bdf0e81723dd99e87947d7.html"> <img src="https://p.pdfkul.com/img/300x300/bayesian-approach-to-derivative-pricing-and-semant_59bdf0e81723dd99e87947d7.jpg" alt="Bayesian Approach To Derivative Pricing And ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">Bayesian Approach To Derivative Pricing And ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-discriminative-learning-approach-for-orientation-semantic-scholar_59d096841723ddf50ef93740.html"> <img src="https://p.pdfkul.com/img/300x300/a-discriminative-learning-approach-for-orientation_59d096841723ddf50ef93740.jpg" alt="A Discriminative Learning Approach for Orientation ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A Discriminative Learning Approach for Orientation ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-topological-approach-for-detecting-twitter-semantic-scholar_59b353351723dda273d982ac.html"> <img src="https://p.pdfkul.com/img/300x300/a-topological-approach-for-detecting-twitter-seman_59b353351723dda273d982ac.jpg" alt="A Topological Approach for Detecting Twitter ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A Topological Approach for Detecting Twitter ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/error-correction-on-a-tree-an-instanton-approach-semantic-scholar_5acef5e47f8b9ae14a8b4572.html"> <img src="https://p.pdfkul.com/img/300x300/error-correction-on-a-tree-an-instanton-approach-s_5acef5e47f8b9ae14a8b4572.jpg" alt="Error Correction on a Tree: An Instanton Approach - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">Error Correction on a Tree: An Instanton Approach - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-multiagent-approach-for-diagnostic-expert-semantic-scholar_59b31d901723dda273d97fde.html"> <img src="https://p.pdfkul.com/img/300x300/a-multiagent-approach-for-diagnostic-expert-semant_59b31d901723dda273d97fde.jpg" alt="A multiagent approach for diagnostic expert ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A multiagent approach for diagnostic expert ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/an-agent-based-approach-to-health-care-semantic-scholar_5a01b8d21723ddec425af4fd.html"> <img src="https://p.pdfkul.com/img/300x300/an-agent-based-approach-to-health-care-semantic-sc_5a01b8d21723ddec425af4fd.jpg" alt="An Agent-based Approach to Health Care ... - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">An Agent-based Approach to Health Care ... - Semantic Scholar</h4> </a> </div> </div> </div> <div class="col-lg-3 col-md-4"> <div class="box-product doc"> <div class="doc-meta-thumb name"> <a href="https://p.pdfkul.com/a-appendix-semantic-scholar_5aced4ab7f8b9adb698b456b.html"> <img src="https://p.pdfkul.com/img/300x300/a-appendix-semantic-scholar_5aced4ab7f8b9adb698b456b.jpg" alt="A Appendix - Semantic Scholar" height="200" class="block" /> <h4 class="name-title">A Appendix - Semantic Scholar</h4> </a> </div> </div> </div> </div> </div> <div class="col-lg-3 col-md-4 col-xs-12"> <div class="panel-meta panel panel-info"> <div class="panel-heading"> <h2 class="text-center panel-title">A Machine-Learning Approach to Discovering ... - Semantic Scholar</h2> </div> <div class="panel-body"> <div class="row"> <div class="col-md-12"> <span class="st">potential <em>website</em> matches for each company name based on a set of explanatory features extracted from the content on each candidate <em>website</em>. Our approach ...</span> </div> <div class="col-md-12"> <div class="doc"> <hr /> <div class="download-button" style="margin-right: 3px; margin-bottom: 6px;"> <a href="https://p.pdfkul.com/download/a-machine-learning-approach-to-discovering-semantic-scholar_59b7d2601723dda273d9c2ef.html" class="btn btn-success btn-block"><i class="fa fa-cloud-download"></i> Download PDF </a> </div> <div class="share-box pull-left" style="margin-right: 3px;">  <a href="http://www.facebook.com/sharer.php?u=https://p.pdfkul.com/a-machine-learning-approach-to-discovering-semantic-scholar_59b7d2601723dda273d9c2ef.html" target="_blank" class="btn btn-social-icon btn-facebook"> <i class="fa fa-facebook"></i> </a>  <a href="http://www.linkedin.com/shareArticle?mini=true&url=https://p.pdfkul.com/a-machine-learning-approach-to-discovering-semantic-scholar_59b7d2601723dda273d9c2ef.html" target="_blank" class="btn btn-social-icon btn-twitter"> <i class="fa fa-twitter"></i> </a> </div> <div class="fb-like pull-left" data-href="https://p.pdfkul.com/a-machine-learning-approach-to-discovering-semantic-scholar_59b7d2601723dda273d9c2ef.html" data-layout="button_count" data-action="like" data-size="large" data-show-faces="false" data-share="false"></div> <div class="clearfix"></div> <div class="row"> <div class="col-md-12" style="margin-top: 6px;"> <span class="btn pull-left" style="padding-left: 0;"><i class="fa fa-file-pdf-o"></i> 168KB Sizes</span> <span class="btn pull-left"><i class="fa fa-download"></i> 0 Downloads</span> <span class="btn pull-left" style="padding-right: 0;"><i class="fa fa-eye"></i> 329 Views</span> </div> </div> <div class="clearfix"></div> <div class="row"> <div class="col-md-12"> <span class="btn pull-left" style="padding-left: 0;"><a data-toggle="modal" data-target="#report" style="color: #f44336;"><i class="fa fa-handshake-o"></i> Report</a></span> </div> </div> </div> </div> </div> <h4 id="comment"></h4> <div id="fb-root"></div> <script> (function (d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/sdk.js#xfbml=1&version=v2.9&appId=266776430439748"; fjs.parentNode.insertBefore(js, fjs); }(document, 'script', 'facebook-jssdk')); </script> <div class="fb-comments" data-href="https://p.pdfkul.com/a-machine-learning-approach-to-discovering-semantic-scholar_59b7d2601723dda273d9c2ef.html" data-width="100%" data-numposts="6"></div> </div> </div> <div class="panel-recommend panel panel-success"> <div class="panel-heading"> <h4 class="text-center panel-title">Recommend Documents</h4> </div> <div class="panel-body"> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-machine-learning-approach-to-discovering-semantic-scholar_59bd69cc1723dd99e8793fbf.html"> <img src="https://p.pdfkul.com/img/60x80/a-machine-learning-approach-to-discovering-semanti_59bd69cc1723dd99e8793fbf.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-machine-learning-approach-to-discovering-semantic-scholar_59bd69cc1723dd99e8793fbf.html"> A Machine-Learning Approach to Discovering ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">An important application that requires reliable website identification arises ... ferent company that offers website hosting services to other companies. In other ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-bidirectional-transformation-approach-towards-semantic-scholar_5a9a47321723dd6e29fa7adb.html"> <img src="https://p.pdfkul.com/img/60x80/a-bidirectional-transformation-approach-towards-se_5a9a47321723dd6e29fa7adb.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-bidirectional-transformation-approach-towards-semantic-scholar_5a9a47321723dd6e29fa7adb.html"> A Bidirectional Transformation Approach towards ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">to produce a Java source model for programmers to implement the system. Programmers add code and methods to the Java source model, while at the same time, designers change the name of a class on the UML ... sively studied by researchers on XML transf</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-bidirectional-transformation-approach-towards-semantic-scholar_5a632e221723dd66690e46da.html"> <img src="https://p.pdfkul.com/img/60x80/a-bidirectional-transformation-approach-towards-se_5a632e221723dd66690e46da.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-bidirectional-transformation-approach-towards-semantic-scholar_5a632e221723dd66690e46da.html"> A Bidirectional Transformation Approach towards ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">to produce a Java source model for programmers to implement the system. Programmers add code and methods to ... synchronized. Simply performing the transformation from UML model to Java source model again ... In: ACM SIGPLANâSIGACT Symposium on Pri</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/the-subjective-approach-to-ambiguity-a-critical-semantic-scholar_5a1a0bc71723dd850400c9f0.html"> <img src="https://p.pdfkul.com/img/60x80/the-subjective-approach-to-ambiguity-a-critical-se_5a1a0bc71723dd850400c9f0.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/the-subjective-approach-to-ambiguity-a-critical-semantic-scholar_5a1a0bc71723dd850400c9f0.html"> The Subjective Approach to Ambiguity: A Critical ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">Oct 8, 2008 - Bayesian model along these lines. We will argue .... with a difference: one would expect the forces of learning, introspection and incentives to ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-game-theoretic-approach-to-apprenticeship-semantic-scholar_5adfb5687f8b9acf578b457e.html"> <img src="https://p.pdfkul.com/img/60x80/a-game-theoretic-approach-to-apprenticeship-semant_5adfb5687f8b9acf578b457e.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-game-theoretic-approach-to-apprenticeship-semantic-scholar_5adfb5687f8b9acf578b457e.html"> A Game-Theoretic Approach to Apprenticeship ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">The following lemma, due to Kearns and Singh [4] (Lemma 7), shows that MZ is essentially a pessimistic estimate for M. Lemma 3. Let M = (S, A, Î³, Î¸, Ï) be a MDP/R where Ï(s) â [â1, 1]k, and let Z âSÃA. Then for all w â Sk and Ï â Î¨, </div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-machine-learning-approach-to-automatic-music-semantic-scholar_59bb5a0a1723dde2a958cbf5.html"> <img src="https://p.pdfkul.com/img/60x80/a-machine-learning-approach-to-automatic-music-sem_59bb5a0a1723dde2a958cbf5.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-machine-learning-approach-to-automatic-music-semantic-scholar_59bb5a0a1723dde2a958cbf5.html"> A Machine Learning Approach to Automatic Music ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">by an analogous-to-digital converter into a sequence of numeric values in a ...... Proceedings of the 18th. Brazilian Symposium on Artificial Intelligence,.</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/the-inductrack-a-simpler-approach-to-magnetic-semantic-scholar_5acb9d517f8b9a0c148b4576.html"> <img src="https://p.pdfkul.com/img/60x80/the-inductrack-a-simpler-approach-to-magnetic-sema_5acb9d517f8b9a0c148b4576.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/the-inductrack-a-simpler-approach-to-magnetic-semantic-scholar_5acb9d517f8b9a0c148b4576.html"> The Inductrack: A Simpler Approach to Magnetic ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">risen to twice the transition speed the levitating force has already reached 80 percent of its asymptotic value. Inductrack systems do not require reaching high speeds before lifting off their auxiliary wheels. From the theory the magnet weight requi</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-game-theoretic-approach-to-apprenticeship-semantic-scholar_5b37ece1097c47e3718b4569.html"> <img src="https://p.pdfkul.com/img/60x80/a-game-theoretic-approach-to-apprenticeship-semant_5b37ece1097c47e3718b4569.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-game-theoretic-approach-to-apprenticeship-semantic-scholar_5b37ece1097c47e3718b4569.html"> A Game-Theoretic Approach to Apprenticeship ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">[1] P. Abbeel, A. Ng (2004). Apprenticeship Learning via Inverse Reinforcement Learning. ... Near-Optimal Reinforcement Learning in Polynomial Time. Ma-.</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-new-approach-to-linear-filtering-and-prediction-semantic-scholar_5a34c7e91723dd5c9a279ff9.html"> <img src="https://p.pdfkul.com/img/60x80/a-new-approach-to-linear-filtering-and-prediction-_5a34c7e91723dd5c9a279ff9.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-new-approach-to-linear-filtering-and-prediction-semantic-scholar_5a34c7e91723dd5c9a279ff9.html"> A New Approach to Linear Filtering and Prediction ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">This paper introduces a new look at this whole assemblage of problems, sidestepping the difficulties just mentioned. The following are the highlights of the paper: (5) Optimal Estimates and Orthogonal Projections. The. Wiener problem is approached fr</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-reuse-based-approach-to-determining-security-semantic-scholar_59f4a99a1723dde03a60ab33.html"> <img src="https://p.pdfkul.com/img/60x80/a-reuse-based-approach-to-determining-security-sem_59f4a99a1723dde03a60ab33.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-reuse-based-approach-to-determining-security-semantic-scholar_59f4a99a1723dde03a60ab33.html"> A Reuse-Based Approach to Determining Security ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">declarative statements about the degree of protection required [17]. Another ..... be Internet script kiddies, business competitors or disgruntled employees. ..... administration risk analysis and management method conforming to ISO15408 (the ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-uniform-approach-to-inter-model-transformations-semantic-scholar_5a0daa451723dd2c09600379.html"> <img src="https://p.pdfkul.com/img/60x80/a-uniform-approach-to-inter-model-transformations-_5a0daa451723dd2c09600379.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-uniform-approach-to-inter-model-transformations-semantic-scholar_5a0daa451723dd2c09600379.html"> A Uniform Approach to Inter-Model Transformations - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">i=1(âx â ci : |{(v1 ::: vm)|(v1 ::: vm)â(name c1 ::: cm) Avi = x}| â si). Here .... uates to true, then those instantiations substitute for the same free variables in ..... Transactions on Software Engineering and Methodology, 6(2):141{172, 1</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/the-subjective-approach-to-ambiguity-a-critical-semantic-scholar_5a156b0c1723ddd4792c668f.html"> <img src="https://p.pdfkul.com/img/60x80/the-subjective-approach-to-ambiguity-a-critical-se_5a156b0c1723ddd4792c668f.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/the-subjective-approach-to-ambiguity-a-critical-semantic-scholar_5a156b0c1723ddd4792c668f.html"> The Subjective Approach to Ambiguity: A Critical ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">Oct 8, 2008 - Â¬I, investing here amounts to paying S dollars in exchange for improving ...... acterized by a stock of models, or analogies, who respond to strategic .... Why is this superior to other behavioral or ad hoc explanations that fit the.</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/listwise-approach-to-learning-to-rank-theory-semantic-scholar_59bc8f891723dd98e8338885.html"> <img src="https://p.pdfkul.com/img/60x80/listwise-approach-to-learning-to-rank-theory-seman_59bc8f891723dd98e8338885.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/listwise-approach-to-learning-to-rank-theory-semantic-scholar_59bc8f891723dd98e8338885.html"> Listwise Approach to Learning to Rank - Theory ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">We give analysis on three loss functions: likelihood .... We analyze the listwise approach from the viewpoint ..... The elements of statistical learning: Data min-.</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/an-ontology-driven-approach-to-support-wireless-semantic-scholar_5b0043398ead0e3a468b456a.html"> <img src="https://p.pdfkul.com/img/60x80/an-ontology-driven-approach-to-support-wireless-se_5b0043398ead0e3a468b456a.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/an-ontology-driven-approach-to-support-wireless-semantic-scholar_5b0043398ead0e3a468b456a.html"> An Ontology-driven Approach to support Wireless ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">enhance and annotate the raw data with semantic meanings. â¢ domain ontology driven network intelligent problem detection and analysis. â¢ user-friendly visual ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/an-approach-to-lossy-image-compression-using-1-semantic-scholar_5ad6cc057f8b9a15358b4571.html"> <img src="https://p.pdfkul.com/img/60x80/an-approach-to-lossy-image-compression-using-1-sem_5ad6cc057f8b9a15358b4571.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/an-approach-to-lossy-image-compression-using-1-semantic-scholar_5ad6cc057f8b9a15358b4571.html"> an approach to lossy image compression using 1 ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">In this paper, an approach to lossy image compression using 1-D wavelet transforms is proposed. The analyzed image is divided in little sub- images and each one is decomposed in vectors following a fractal Hilbert curve. A Wavelet Transform is thus a</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/bayesian-approach-to-derivative-pricing-and-semantic-scholar_59d037db1723dd3b0945a3e9.html"> <img src="https://p.pdfkul.com/img/60x80/bayesian-approach-to-derivative-pricing-and-semant_59d037db1723dd3b0945a3e9.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/bayesian-approach-to-derivative-pricing-and-semantic-scholar_59d037db1723dd3b0945a3e9.html"> Bayesian Approach To Derivative Pricing And ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">(time and asset price) â for which a practical satisfactory calibration method ..... Opponents of the Bayesian approach to data analysis often argue that it is fun-.</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/an-approach-to-lossy-image-compression-using-1-semantic-scholar_5b074cbb8ead0ed3178b4573.html"> <img src="https://p.pdfkul.com/img/60x80/an-approach-to-lossy-image-compression-using-1-sem_5b074cbb8ead0ed3178b4573.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/an-approach-to-lossy-image-compression-using-1-semantic-scholar_5b074cbb8ead0ed3178b4573.html"> an approach to lossy image compression using 1 ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">images are composed by 256 grayscale levels (8 bits- per-pixel resolution), so an analysis for color images can be implemented using this method for each of ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/bayesian-approach-to-derivative-pricing-and-semantic-scholar_59bdf0e81723dd99e87947d7.html"> <img src="https://p.pdfkul.com/img/60x80/bayesian-approach-to-derivative-pricing-and-semant_59bdf0e81723dd99e87947d7.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/bayesian-approach-to-derivative-pricing-and-semantic-scholar_59bdf0e81723dd99e87947d7.html"> Bayesian Approach To Derivative Pricing And ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">The applications of constructing a distribution of calibration parameters are broad and far ... are introduced and the development of methods to manage model ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-discriminative-learning-approach-for-orientation-semantic-scholar_59d096841723ddf50ef93740.html"> <img src="https://p.pdfkul.com/img/60x80/a-discriminative-learning-approach-for-orientation_59d096841723ddf50ef93740.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-discriminative-learning-approach-for-orientation-semantic-scholar_59d096841723ddf50ef93740.html"> A Discriminative Learning Approach for Orientation ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">180 and 270 degrees because usually the document scan- ning process results in .... features, layout and font or text-printing technology. In Urdu publishing ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-topological-approach-for-detecting-twitter-semantic-scholar_59b353351723dda273d982ac.html"> <img src="https://p.pdfkul.com/img/60x80/a-topological-approach-for-detecting-twitter-seman_59b353351723dda273d982ac.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-topological-approach-for-detecting-twitter-semantic-scholar_59b353351723dda273d982ac.html"> A Topological Approach for Detecting Twitter ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">marketing to online social networking sites. Existing methods ... common interest [10â12], these are interaction-based methods which use tweet- ..... categories in Twitter and we selected the five most popular categories among them.3 For each ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/error-correction-on-a-tree-an-instanton-approach-semantic-scholar_5acef5e47f8b9ae14a8b4572.html"> <img src="https://p.pdfkul.com/img/60x80/error-correction-on-a-tree-an-instanton-approach-s_5acef5e47f8b9ae14a8b4572.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/error-correction-on-a-tree-an-instanton-approach-semantic-scholar_5acef5e47f8b9ae14a8b4572.html"> Error Correction on a Tree: An Instanton Approach - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">Nov 5, 2004 - of edges that originate from a node are referred to as its degree. In this Letter we discuss primarily codes with a uniform variable and/or check node degree distribution. Note that relations between the ..... [9] J. S. Yedidia,W. T. Fr</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-multiagent-approach-for-diagnostic-expert-semantic-scholar_59b31d901723dda273d97fde.html"> <img src="https://p.pdfkul.com/img/60x80/a-multiagent-approach-for-diagnostic-expert-semant_59b31d901723dda273d97fde.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-multiagent-approach-for-diagnostic-expert-semantic-scholar_59b31d901723dda273d97fde.html"> A multiagent approach for diagnostic expert ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">Expert Systems with Applications 27 (2004) 1â10 ... Web. Expert systems running on the Internet can support a large group of users who .... One of the best advantages of the ... host names of the target agent, and the port number on which the ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/an-agent-based-approach-to-health-care-semantic-scholar_5a01b8d21723ddec425af4fd.html"> <img src="https://p.pdfkul.com/img/60x80/an-agent-based-approach-to-health-care-semantic-sc_5a01b8d21723ddec425af4fd.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/an-agent-based-approach-to-health-care-semantic-scholar_5a01b8d21723ddec425af4fd.html"> An Agent-based Approach to Health Care ... - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">Dept. of Electronic Engineering, Queen Mary & Westfield College, Mile. End Road ... Abbreviated title: Agent-based health care management ... care occurred - a serious concern as time is such a critical factor in care administration). Thirdly ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://p.pdfkul.com/a-appendix-semantic-scholar_5aced4ab7f8b9adb698b456b.html"> <img src="https://p.pdfkul.com/img/60x80/a-appendix-semantic-scholar_5aced4ab7f8b9adb698b456b.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <a href="https://p.pdfkul.com/a-appendix-semantic-scholar_5aced4ab7f8b9adb698b456b.html"> A Appendix - Semantic Scholar </a> <div class="doc-meta"> <div class="doc-desc">buyer during the learning and exploit phase of the LEAP algorithm, respectively. We have. S2. T. X t=Tâµ+1 Î³t1 = Î³Tâµ. T Tâµ. 1. X t=0 Î³t = Î³Tâµ. 1 Î³. (1. Î³T Tâµ ) . (7). Indeed, this an upper bound on the total surplus any buyer can hope </div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> </div> </div> </div> </div> </div> <div class="modal fade" id="report" tabindex="-1" role="dialog" aria-hidden="true"> <div class="modal-dialog"> <div class="modal-content"> <form role="form" method="post" action="https://p.pdfkul.com/report/59b7d2601723dda273d9c2ef" style="border: none;"> <div class="modal-header"> <button type="button" class="close" data-dismiss="modal" aria-hidden="true">×</button> <h4 class="modal-title">Report A Machine-Learning Approach to Discovering ... - Semantic Scholar</h4> </div> <div class="modal-body"> <div class="form-group"> <label>Your name</label> <input type="text" name="name" required="required" class="form-control" /> </div> <div class="form-group"> <label>Email</label> <input type="email" name="email" required="required" class="form-control" /> </div> <div class="form-group"> <label>Reason</label> <select name="reason" required="required" class="form-control"> <option value="">-Select Reason-</option> <option value="pornographic" selected="selected">Pornographic</option> <option value="defamatory">Defamatory</option> <option value="illegal">Illegal/Unlawful</option> <option value="spam">Spam</option> <option value="others">Other Terms Of Service Violation</option> <option value="copyright">File a copyright complaint</option> </select> </div> <div class="form-group"> <label>Description</label> <textarea name="description" required="required" rows="3" class="form-control"></textarea> </div> <div class="form-group"> <div style="display: inline-block;"> <div class="g-recaptcha" data-sitekey="6LeP2DsUAAAAAABvCByMZRCE253cahUVoC_jPUkq"></div> </div> </div> <script src='https://www.google.com/recaptcha/api.js'></script> </div> <div class="modal-footer"> <button type="button" class="btn btn-default" data-dismiss="modal">Close</button> <button type="submit" class="btn btn-primary">Save changes</button> </div> </form> </div> </div> </div>  <div class="modal fade" id="login" tabindex="-1" role="dialog" aria-labelledby="myModalLabel"> <div class="modal-dialog" role="document"> <div class="modal-content"> <div class="modal-header"> <button type="button" class="close" data-dismiss="modal" aria-label="Close" on="tap:login.close"><span aria-hidden="true">×</span></button> <h3 class="modal-title">Sign In</h3> </div> <div class="modal-body"> <form action="https://p.pdfkul.com/login" method="post"> <div class="form-group form-group-lg"> <label class="sr-only" for="email">Email</label> <input class="form-input form-control" type="text" name="email" id="email" value="" placeholder="Email" /> </div> <div class="form-group form-group-lg"> <label class="sr-only" for="password">Password</label> <input class="form-input form-control" type="password" name="password" id="password" value="" placeholder="Password" /> </div> <div class="form-group form-group-lg"> <div class="checkbox"> <label class="form-checkbox"> <input type="checkbox" name="remember" value="1" /> <i class="form-icon"></i> Remember Password </label> <label class="pull-right"><a href="https://p.pdfkul.com/forgot">Forgot Password?</a></label> </div> </div> <button class="btn btn-lg btn-primary btn-block" type="submit">Sign In</button> </form> </div> </div> </div> </div>  <div class="footer-container" style="background: #fff;display: block;padding: 10px 0 20px 0;margin-top: 30px;"> <hr /> <div class="footer-container-inner"> <footer id="footer" class="container"> <div class="row">  <section class="block col-md-4 col-xs-12 col-sm-3" id="block_various_links_footer"> <h4>Information</h4> <ul class="toggle-footer" style=""> <li><a href="https://p.pdfkul.com/about">About Us</a></li> <li><a href="https://p.pdfkul.com/privacy">Privacy Policy</a></li> <li><a href="https://p.pdfkul.com/term">Terms and Service</a></li> <li><a href="https://p.pdfkul.com/copyright">Copyright</a></li> <li><a href="https://p.pdfkul.com/contact">Contact Us</a></li> </ul> </section>  <section id="social_block" class="col-md-4 col-xs-12 col-sm-3 block"> <h4>Follow us</h4> <ul> <li class="facebook"> <a target="_blank" href="" title="Facebook"> <i class="fa fa-facebook-square fa-2x"></i> <span>Facebook</span> </a> </li> <li class="twitter"> <a target="_blank" href="" title="Twitter"> <i class="fa fa-twitter-square fa-2x"></i> <span>Twitter</span> </a> </li> <li class="google-plus"> <a target="_blank" href="" title="Google Plus"> <i class="fa fa-plus-square fa-2x"></i> <span>Google Plus</span> </a> </li> </ul> </section>  <div id="newsletter" class="col-md-4 col-xs-12 col-sm-3 block"> <h4>Newsletter</h4> <div class="block_content"> <form action="https://p.pdfkul.com/newsletter" method="post"> <div class="form-group"> <input id="newsletter-input" type="text" name="email" size="18" placeholder="Entrer Email" /> <button type="submit" name="submit_newsletter" class="btn btn-default"> <i class="fa fa-location-arrow"></i> </button> <input type="hidden" name="action" value="0"> </div> </form> </div> </div>  </div> <div class="row"> <div class="bottom-footer"> <div class="container"> Copyright © 2024 P.PDFKUL.COM. All rights reserved. </div> </div> </div> </footer> </div> </div>  <script> $(function () { $("#document_search").autocomplete({ source: function (request, response) { $.ajax({ url: "https://p.pdfkul.com/suggest", dataType: "json", data: { term: request.term }, success: function (data) { response(data); } }); }, autoFill: true, select: function (event, ui) { $(this).val(ui.item.value); $(this).parents("form").submit(); } }); }); </script>  <script async src="https://www.googletagmanager.com/gtag/js?id=G-VPK2MQK127"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-VPK2MQK127'); </script> </body> </html>