News: Google considering ranking sites on factual content rather than popularity

Zombie_Fish · Mar 2, 2015

thaluikhain said:
Or, potentially, other news services people at Google happened not to like.

Yeah, I'm a bit wary of this.

Google has had the functionality to pull any sites they don't like from their search results for years. One of Matthew Inman's former sites was pulled for helping improve the rankings of spam-y sites.[footnote]http://moz.com/blog/widgetbait-gone-wild[/footnote]

That one was pulled for actual abuse of the system as opposed to Google's own biases, but the point still stands: If you don't trust this proposed change then you shouldn't really be trusting them anyway.

DoPo · Mar 2, 2015

Zombie_Fish said:
thaluikhain said:

Or, potentially, other news services people at Google happened not to like.

Yeah, I'm a bit wary of this.

Click to expand...

Google has had the functionality to pull any sites they don't like from their search results for years. One of Matthew Inman's former sites was pulled for helping improve the rankings of spam-y sites.[footnote]http://moz.com/blog/widgetbait-gone-wild[/footnote]

That one was pulled for actual abuse of the system as opposed to Google's own biases, but the point still stands: If you don't trust this proposed change then you shouldn't really be trusting them anyway.

Heh, seems people don't actually know about the dark arts of SEO. The negative SEO.

Here is the brief of it - SEO stands for Search Engine Optimisation. In simple terms, it's techniques that would give you higher ranking in Google results. Negative SEO is, as the name suggests, the opposite. Usually done to somebody else. Hey, it's one way to get ahead of them. It's a really scummy practice and usually frowned upon (by Google, too) but a practice nonetheless. It's been around for years, and it's still around.

Also, it's not like Google aren't removing search results. They are. Usually by request. And, yeah, it's not always done with good intentions or results [http://torrentfreak.com/google-porn-takedowns-carpet-bomb-github-150107/]. For the record, other takedown requests are funny [https://www.chillingeffects.org/notices/312319] or even sort of retarded [https://www.chillingeffects.org/notices/357263].

But, as always, Chilling Effects [https://www.chillingeffects.org/] would show any of them. It's a website dedicated to takedown notices of all kinds (not just limited to Google). It's a really good resource for...well, some chilling insights into what's going on behind the scenes. Google have so far shown their want of transparency for those matters [http://torrentfreak.com/google-protects-chilling-effects-from-takedown-notices-140727/].

Still, if somebody at google decides to just ban Fox News, that would probably not be documented. It might be celebrated, however, it would also be noticed by Fox News, who are sure to contact Google and if Google can't give them good explanation for what's going on, there are sure to be lawsuits lining up. I am fairly sure accountability is...well, accounted for.

EDIT: Oh, I forgot - there is also the Google transparency report [https://www.google.com/transparencyreport/] - it's another interesting resource.

FalloutJack · Mar 2, 2015

If wikipedia isn't dead last, I call shenannigans.

Never trust wikis.

Nowhere Man · Mar 2, 2015

So while the media outlets churn out what they consider truth based on the narratives they choose to create, Google will be there to aggregate and push to the top of the rankings the links that are based on what these "facts" are. Yeah this will end well. Hideo Kojima help us all.

Edit: Of course now I sound like a conspiracy nutter so I'll add that I don't really fully trust the idea but I am curious on the results once they get it working. I can just see the potential for abuse, but it's not like the current system isn't abused anyway.

Alcamonic · Mar 2, 2015

So once this system is in place, what would happen if you write in Fox News?

Thaluikhain · Mar 2, 2015

Zombie_Fish said:
thaluikhain said:

Or, potentially, other news services people at Google happened not to like.

Yeah, I'm a bit wary of this.

Click to expand...

Google has had the functionality to pull any sites they don't like from their search results for years. One of Matthew Inman's former sites was pulled for helping improve the rankings of spam-y sites.[footnote]http://moz.com/blog/widgetbait-gone-wild[/footnote]

That one was pulled for actual abuse of the system as opposed to Google's own biases, but the point still stands: If you don't trust this proposed change then you shouldn't really be trusting them anyway.

Well, yes, but then people get worried when their government talks about doing something possibly shady, in the knowledge that they've been doing hady things for years.

Smooth Operator · Mar 2, 2015

Well it would certainly be a welcome change to the current commercial system.
Although I would mainly want it to have a "here is some factual information we found on the subject" in their sidebar, rather then eliminating results in the background based on aggregate information that might or might not be accurate.

The proposed system is without a doubt completely non trivial, but before Google came along with their algorithm search engines were useless pieces of dog shit, so if someone can put together a wild new process it is probably Google.

Nathan Josephs · Mar 2, 2015

oh crap *certain* political interests would have a total hissy fit if it ever came to pass.

CrystalShadow · Mar 2, 2015

Well, there goes wikipedia's search ranking down the drain... XD

But in all seriousness, this doesn't sound half bad.

I liked a different system more, but getting it to work would be overly intrusive.
It would require a browser plugin and knowing what people searched for, which links they followed, and how much time they spent looking at any given page.
And given that it would be running in the browser the whole time, that's an even bigger privacy issue that search engines in general.
The system was described in a book. Basically, rather than looking at number links to a page, it correlates what search terms were entered with which sites a person chose to look at based on the search results, and crucially, how much time they actually spent looking at the pages that they checked as a result of the search.

The idea being, if the returned page wasn't what the person doing a search was looking for, they'd leave quickly, while if it was what they were looking for (or at least something relatively similar), they'd stay on the page much longer.

So in principle that would result in search rankings reflecting what people actually bother to look at, and things which get skipped almost immediately are clearly not good results.

Checking for truthfulness of a site is of course also useful, but only if factual content is present.

Still, this other system would be relatively intrusive by comparison. (On the other hand, it could be opt-in. You don't need EVERYONE to do it to improve search results, just enough of them to get meaningful data. - You could also fall back to another method for searches where there isn't enough data to really work anything out.)

Sarge034 · Mar 2, 2015

Nathan Josephs said:
oh crap *certain* political interests would have a total hissy fit if it ever came to pass.

So... all of them?

Anyway, it's an interesting idea but as it has been said above I don't think it's going to work in most situations. If I had to guess I'd say it'll be under the advanced search options like the rest of the more academically focused search options are.

jklinders · Mar 2, 2015

Well now I can just see the typical tin hatter sites sprinkling into their blog posts simple random true facts (sun rises in the east, winter is cold, summer is hot etc) to try to trick this system into continuing to give them top billing. This depends greatly on how powerful the system's ability to understand context is going to be. The entire time they are doing this they will be crying conspiracy and censorship on Google's doing this. This will reenforce their victim complex and probably drive more idio-er people to their flags.

This is definitely far from sorted.

Shinkicker444 · Mar 2, 2015

Alcamonic said:
So once this system is in place, what would happen if you write in Fox News?

Well I imagine you'd have to wade through about 15 pages of search returns until you find the actual fox news website.

DoPo · Mar 2, 2015

Nathan Josephs said:
oh crap *certain* political interests would have a total hissy fit if it ever came to pass.

Oh my god, and Google are actually having a ball with this, right now. I've only started reading their paper and here are some quotes

KV stores information in the form of RDF triples (subject, predicate, object). An example is
</m/02mjmr, /people/person/place_of_birth /m/02hrh0_>, where /m/02mjmr is the Freebase id for Barack Obama, and /m/02hrh0_ is the id for Honolulu.

First, the Knowledge Vault is different from previous works on automatic knowledge base construction as it combines noisy extractions from the Web together with prior knowledge, which is derived from existing knowledge bases (in this paper, we use Freebase as our source of prior data). [...] KV's prior model can help overcome errors due to the extraction process, as well as errors in the sources themselves. For example, suppose an extractor returns a fact claiming that Barack Obama was born in Kenya, and suppose (for illustration purposes) that the true place of birth of Obama was not already known in Freebase. Our prior model can use related facts about Obama (such as his profession being US President) to infer that this new fact is unlikely to be true. The error could be due to mistaking Barack Obama for his father (entity resolution or co-reference resolution error), or it could be due to an erroneous statement on a spammy Web site (source error).

KV is a probabilistic database, and it can support simple queries, such as BarackObama BornIn ?, which returns a distribution over places where KV thinks Obama was born.

And more. Obama is the core of a lot of their examples.

happyninja42 · Mar 2, 2015

tippy2k2 said:
While I appreciate the effort, I wonder how this will actually (or if it CAN actually) work. Some things have cold hard facts to them; like water freezes at 32 F, the sun is a star, tippy2k2 is the sexist Escapist user, etc.

However, how is that going to work with things that are not 100% factual? I don't even mean subjective things like best movie ever or anything like that but things like how much water is needed every day for someone to be healthy, how much sleep you should get in a day, the best way to treat a fever, which body part is the sexist on tippy2k2. These are all questions that don't necessarily have a "right" answer and even experts have different opinions...

It would be nice to have a more reliable way to search through information but I'm not sure how possible it is.

I think it would be pretty easy to develop some kind of accuracy metric, based on cited sources for the information versus random opinions without any evidence to back them up.

DoPo · Mar 2, 2015

Happyninja42 said:
I think it would be pretty easy to develop some kind of accuracy metric, based on cited sources for the information versus random opinions without any evidence to back them up.

But then you get into a complicated situation trying to verify the validity of a statement. There is two types of problems (that I can think of) you can hit:

1. There are sources that back up a statement. The sources are also backed up. The sources of those, however, are bogus. Example: propagated misinformation, such as "The humans use only 10% of their brains" or some shit. The chain could be really long, you have to parse the entirety of it, and if the source is something free text (so, the most likely scenario), you have to somehow make the computer understand enough to verify if the fact is indeed there. This makes verifying a statement quite hard and quite time inefficient.

2. The bottomline sources that back up a statement are "common sense". It's really hard to find common sense on the Internet. And that's not a pun. Even though I often feel that is also true, but I digress. Usually IRL common sense knowledge is not on the web because...well, it's common sense. You rarely need to document that, for example, riding a bicycle is a faster transportation than walking.

But actually what the KV is doing is not that complicated or new, if we look at it in simplified form. It's a lot like backwards chaining but with fuzzy logic applied to the new facts. In effect, it's got a bunch of data which expresses a number of statements with a various degree of certainty, and when it comes across a new statement, it verifies it against the previously known stuff and based on them, assigns it a confidence, then adds it to the list of other statements. That's really a basic explanation of how it works, the actual one involves more maths and, frankly, I struggled to follow it myself. It makes sense, though, as I said, it's a lot like backwards chaining.

Let's illustrate how things are supposed to go. Let's say, the KV comes across the statement:
"tippy2k2 is the sexiest person in the solar system"
Now, it runs this through the things we already know, and it turns out that it knows the following
tippy2k2 was unanimously voted the most sexy Escapist user
tippy2k2 is the sexiest person in USA
tippy2k2 is the sexiest person in South America
tippy2k2 is the sexiest person on the moon
tippy2k2 is the sexiest person on Jupiter
tippy2k2 is the sexiest user in the Atlantic ocean

Now, of course we know the new statement is true, but the KV is going to reason only with the above facts. So, it assigns some confidence value to the new statement, let's say, 90% since it seems likely based on these facts.

Now, it comes across the statement "shrekfan246 is the sexiest person in the universe. For simplicity's sake, let's assume there are three solar systems in the entire universe (small world, eh). And the KV knows that shrekfan246 is the sexiest person in the other two solar systems. However, seeing that he isn't unanimously the sexiest person in all of them (remember, tippy2k2 is 90% likely to be the sexiest in our solar system), the KV decides that it's pretty likely but not totally, so it assigns a confidence of 98% to the statement.

And so on and so forth.

News: Google considering ranking sites on factual content rather than popularity

Zombie_Fish

Opiner of Mottos

DoPo

"You're not cleared for that."

FalloutJack

Bah weep grah nah neep ninny bom

Nowhere Man

New member

Alcamonic

New member

Thaluikhain

Elite Member

Smooth Operator

New member

Nathan Josephs

New member

CrystalShadow

don't upset the insane catgirl

Sarge034

New member

jklinders

New member

Shinkicker444

New member

DoPo

"You're not cleared for that."

happyninja42

Elite Member

DoPo

"You're not cleared for that."