Archive

Archive for June, 2010

Graph clustering algorithm

June 30, 2010 Leave a comment

首先这是个经典的问题,但是一些算法看了还是似懂非懂,基础差是问题所在。

我看到方法主要分为两类:

  1. Graph partitioning based on minimum cut or spectral partitioning. 简单说就是最小化不同簇之间的连接.   详细参考 http://www.cs.berkeley.edu/~demmel/cs267/lecture20/lecture20.html 这种方法存在一些不足,它一次只能二分,对于不知道能聚成多少的类的情况,如果采用多次二分方法,最小化簇之间的连接并不是个好的准则。  感觉二分的思想还是很精妙的,使用线性代数矩阵的知识,数学支撑还是很强。
  2. Modularity    基本思想是“There must be a smaller than expected number edges between communities”,
    Define modularity to be Q = (number of edges within groups) – (expected number within groups). 详细google ” Modularity and community structure in networks

实现这些算法感觉挺难的,幸好有些开源的包,比如matlab中就有、Graclus softwareGraph Clustering

这些算法都比较耗时,复杂度高,遇到大规模数据时就麻烦啦。

Categories: interesting reserch Tags:

Summary [3]

June 15, 2010 1 comment

Brief summary for last week, from 7 June to 13 June.

It is an exciting week for my GSoC.  I have released the first version of my Extension BUGSTATS, and it fulfills the basic functions expected.

Main Steps follow:

  • add a new hook “additional_user_data” located in bugzilla/template/en/default/global/ user.html.tmpl.
  • add a hypelink template for directing to personal stats page in BugStats/template/en/default/hook/global/user-additional- user-data.html.tmpl
  • as to personal stats page, use the existing hook “page_before_template”, this hook is often used for adding new-defined page. In BugStats/template/Extension.pm ‘s sub “page_before_template” pass stats information to stats page template, which located in BugStats/template/en/default/pages/stats/user.html.tmpl

Besides, during the coding I learn some basic about the Template Toolkit, and it is smart and useful.

Demo:bugstats

Categories: GSoC2010

Summary [2]

June 6, 2010 Leave a comment

summary for this week:

1. During reading Bugzilla source code, I come across some Perl language problem. THEN, Learning some advance Perl feature, including reference, DBI, template.

2. Since my task for GSoC is to create an Extension for the purpose of collecting Statistics to show about a user in Bugzilla. for the first thing, I need to understand how Bugzilla extension works, and then how to write a Bugzilla extension. In Bugzilla, I try to understand two extensions: /extensions/example; /extension/voting.

3. Besides, I begin to  try to write a “HelloWorld” extension,  which first collects some Stats including #comments, #fixedBug, #SubmitedBug.

I implement hook sub page_before_template:


sub page_before_template {
    my ($self, $args) = @_;
    my $page = $args->{page_id};
    my $vars = $args->{vars};

   if ($page =~ m{^stats/user\.}) {
       _page_user($vars);
    }
}

sub _page_user {
    my ($vars) = @_;
    my $dbh = Bugzilla->dbh;
    my $user = Bugzilla->user;
    my $input = Bugzilla->input_params;
    my $who_id = $input->{user_id} || $user->id;
    my $who = Bugzilla::User->check({ id => $who_id });

    # 
    my (@sql_statements, %all_bug_ids,@all_bug_cnts, $id, $sql_state);
    my @types= qw( #bugs_reported #bugs_assigned #comment #voting #cc #qa #patch );

    $sql_statements[0] = "SELECT bugs.bug_id FROM  bugs  WHERE bugs.reporter = ?";
    $sql_statements[1] = "SELECT bugs.bug_id FROM  bugs  WHERE bugs.assigned_to = ?";
    $sql_statements[2] = "SELECT DISTINCT longdescs.bug_id FROM longdescs  WHERE longdescs.who = ?";
    $sql_statements[3] = "SELECT votes.bug_id FROM votes  WHERE votes.who = ?";
    $sql_statements[4] = "SELECT cc.bug_id FROM  cc WHERE cc.who = ?";
    $sql_statements[5] = "SELECT bugs.bug_id FROM bugs WHERE bugs.qa_contact = ?";
    $sql_statements[6] = "SELECT attachments.bug_id FROM attachments WHERE attachments.submitter_id = ? AND attachments.ispatch = 1";
    
    for (my $index = 0; $index < @sql_statements; $index++){
        my $sth = $dbh->prepare($sql_statements[$index]);
        $sth->execute($who->id);
        my @bug_ids;
        while(($id) = $sth->fetchrow_array())
        {
            push (@bug_ids, $id);
        }
        $all_bug_ids{$types[$index]} = [@bug_ids];
    
        my $cnt = @bug_ids;
        push (@all_bug_cnts, $cnt);
        $sth->finish();
    }


    # Calculate Point for userid
    my $point = log($all_bug_cnts[0] + 1) + log($all_bug_cnts[1]+ 1)*2 + log($all_bug_cnts[2])/log(10) + log($all_bug_cnts[3]+ 1) + log($all_bug_cnts[4] + 1) + 3*log($all_bug_cnts[6] + 1);

    $vars->{'all_bugs'} = \%all_bug_ids;
    $vars->{'point'} = $point;
    $vars->{'user'} = $who;
    #$vars->{'types'} = ['#bugs_reported', '#bugs_assigned', '#comment', '#voting', '#cc', '#qa','#patch'];
}

ps: DBI: http://www.felixgers.de/teaching/perl/perl_DBI.html

http://template-toolkit.org/docs/tutorial/Web.html

Categories: GSoC2010, Perl Tags: