Thursday 4 December 2014

How to build a naive (very naive) system scored over 30,000 in RecSysChallenge 2015?

How to build a naive (very naive) system scored over 30,000 in RecSysChallenge 2015?

-The task

Given a sequence of click events performed by some user during a typical session in an e-commerce website, the goal is to predict whether the user is going to buy something or not, and if he is buying, what would be the items he is going to buy.

The detail of RecSysChallenge 2015 can be found at http://2015.recsyschallenge.com/index.html.


-A naive system (too simple and too naive!!)

It is very easy to build a system which achieves a score > 30,000 by using two simple rules.

--Rules
Rule#1: The items which are bought no less than MINF=10 times in the train data.
Rule#2: The items which are clicked no less than MINCLICKS=2 times *in each sesssion* in the test data.

--Steps:
Step#1: Obtain the list (buys.list) of items which are bought in the train data.
Step#2: Keep the items following Rule#1 and Rule#2 for each session.

--The code in perl
########################

#Usage
print stderr "Usage: perl a.pl buys.list MINF test_file MINCLICKS\n";

$MINF = $ARGV[1];
$MINCLICKS = $ARGV[3];
open FP, $ARGV[0] or die; #buys.list
while(){
        ($item, $freq) = split /\t/;
        next if($item eq "" or $freq < $MINF);
        $itemlist{$item} = $freq;
        #print stderr "ITEM:$item\n";
}
close FP;

open FP, $ARGV[2] or die;#test file
while(){
        ($sid, $stime, $item, $sg) = split /\,/;
        next if($sid eq "");
        next if(!exists $itemlist{$item});
        #print stderr "Tobuy:$sid\t$item\n";
        $pred_buys{$sid}{$item}++;
}
close FP;

foreach $sid (sort{$a <=> $b} keys %pred_buys){
        $pred = "";
        foreach $item (keys %{$pred_buys{$sid}}){
                $pred .= $item."," if($pred_buys{$sid}{$item} >=$MINCLICKS);
        }
        next if($pred eq "");
        $pred =~ s/\,$//g;
        print "$sid;$pred\n";
}

##The END
########################

Upload the results and get a score =33780.1.

OK. Time to have a rest. Let us party!!!

1 comment: