Derivative application of TF DF algorithm in Shanghai dragon in

keyword density query to query the TF value:


at the same time, if an article appeared in the us to query words, we will conclude that the article has the correlation and we need to check the word. The continuation of this idea, is if you want to check the word appears in a document more correlation before the articles and to query words should be greater.

We can use the

with this problem, we used to learn the TF-IDF algorithm and the TF-IDF algorithm is derived in the Shanghai dragon.

so we in the TF-IDF algorithm, first define a TF (T, d) said the word t in article D of the times.

TF if the frequency of a word or phrase in an article appearing in the high, and rarely appear in the other articles, think that this word or phrase has good category distinguishing ability for classification.

TF-IDF algorithm is a statistical algorithm for the weighted retrieval. Simply speaking, its role is to assess the importance of a word in a file.

in the application of the derivative of Shanghai Longfeng, so we can understand the above paragraph: in a company, there are 10 Shanghai dragon Er, everyone wrote an article about Shanghai Longfeng articles, and these articles are placed in a set of documents. We can anticipate that will repeat the word Shanghai dragon basically every article, means that these ten articles are related to Shanghai dragon. I want to find an article on the weights of the website of Shanghai Longfeng paper is now. Then I will enter the "Shanghai dragon website weight in the search engine".


I finally found two articles appeared at the same time this two word article, which first appeared 2 times "website weight" and the 10 "Shanghai dragon, another article appeared 10 times" website weight "and the 2" Shanghai dragon". The question now is: put aside the quality (site overall weight), the quality (page weight), the company experts recommend (high quality chain) and other various factors, who should be front row in search results of


but only consider the number of words appeared is not enough, because we often query are more than two words, such as "AA BB" or "XX YY ZZ" form. If this is the form of the query, the number of which words should be the important basis for this? "

The core concept of ?

in our previous work in Shanghai Longfeng, technology is applied to the keyword density, is based on the principle of the TF.

Leave a Reply

Your email address will not be published. Required fields are marked *