Google News ranking factors, 2003 patent revealed
Thu, Aug 20, 2009
Vote on HN
by Chris Crum
A patent application by Google "Systems and methods for improving the ranking of news articles
" was Granted on August 18, 2009. The patent was originally filed about 6 years ago on "September 16, 2003". Interesting analysis in human readable language at Seo By The Sea
by Bill Slawski.
Before continuing, it is better if you read the Bill's
In spite of this filing being 6 years old, I personally believe some of the theory is still valid today. It is important to know what Google was
doing in 2003 to better understand what it may
be doing today.
Abstract of the patent :-
A system ranks results. The system may receive a list of links. The system may identify a source with which each of the links is associated and rank the list of links based at least in part on a quality of the identified sources.
I will first discuss points already established and then try to have my own conclusions.
: This is a rank given to different news sources. An article from a source having higher "Source Rank" would be more likely to rank higher than others. According to the patent, the following metrics go into determining the "Source Rank".
Number of articles produced by the news source during a given time period
: Presumably more the better, rather more original articles the better compared to newswire stories.
Average length of an article from the news source
: Presumably, a news source with longer articles would get a better Source Rank
Breaking news score
: The most interesting aspect, I had a rough feeling this was an important factor, the patent agrees. Ill discuss in my conclusions below, citing examples. Basically as per the patent, a news source which publishes news about events which just occurred, gives source a higher Source Rank
: Tracking click thrus from Google News search and analyzing that data. All links on Google news are redirected thru their forwarder. They have been tracking this data for as long as i can remember.
Human opinion of the news source
: Quite obvious :)
Circulation statistics of the news source
: Circulation stats from various media monitoring agencies.
The size of the staff associated with the news source
: Google recently started showing(where possible) author names in news search. These are detected automagically using some algorithm. Im quite sure that they have been tracking these internally for quite some time.
The number of news bureaus associated with the news source
To favour bigger more established news outlets.
Original named entities appearing in articles produced by the news source
: A named entity is a specific person, place, organization, or thing. More unique Named Entities the better. This probably shows more in-depth news source.
Number of topics on which the source produces content
: To determine the niche the news source participates in. A news source like TechCrunch
almost exclusively writes about Tech related articles, Google may then determine that TechCrunch is an authority on Tech related topics.
International diversity of the news source
: Checking on countries from where people visit to these sites from via Google News Search based on IP.
The writing style used by the news source
: Grammar, spelling, readability. Writing style may also help Google determine target audience. (eg British vs American English)
This is not at all related to what Google told us about ranking on Google News
, it does provide some nice insight.
Now what i believe, is that Google News
also implements what id like to call a Source Rank per Topic
. The Breaking news score
as explained above is applicable on per topic basis too. Example my site had few stories about an incident just after a major news broke. It got some traffic, then got clouded by the regular big sources which allegedly have a much higher Source Rank
. But from a couple of days later, any follow-ups I did, ranked well on Google News. My assumption is that Google sees which sources were the ones to Break
the particular story and assigns them a temporary(or permanent) authority on the topic.
I have no views on the content length point, but i do agree that more original sentences do result in a higher Source Rank
Another point which i don't see mentioned but have a strong belief to be an important factor for the Source Rank
is the performance of the website. Its basic common sense, that if Google is sending a lot of traffic, they don't want these people to wait for ages while the overloaded servers of the News site is churning out the pages. Google would rather like faster sites. This was personally observed by me after I implemented a new caching mechanism which made average random page generation time drop to 50 to 100ms from previous ~1s . Within days my traffic from Google doubled. So even if you are running a small site like mine, it is best to keep your random page load delay as small as possible.
Google also sees(IMHO) regular SEO policies in determining the Story Rank
for a news source. Internal linkage, external Linkage, etc..