Refine bugzilla extension code

July 17, 2010 Leave a comment

After I host the souce code on the Github, my mentor Guy suggested me refine the code, and I finish this job now and commit code to Github

PS: most used git commands

  1. git add .
  2. git commit -m “your comments”
  3. git push origin master
Categories: GSoC2010 Tags:

最近一周很恶心的bug

July 9, 2010 Leave a comment

1. 有段时间没写过Java, 以前写一般是边写边学,没有google就没法写的那种。
遇到的问题是,HashMap的key 是一个自定义对象MyObject,要实现我需要的查找得自己重写equals(), hashCode()函数,这些我都知道的,但是还是出问题了,在重写equals()时,比较两个字符串用了 “==”, 应该用String的 equals()判断, 导致HashMap中没法找到。
那么“==”和“equals()”区别是什么呢?

String s1 = new String("str");
String s2 = new String("str");

如果用==号比较,会返回false,因为创建了两个对象,他们在内存中地址的位置是不一样的。
equals, 会返回true,它是java.lang.Object类中的一个方法。因为java中所有的类都默认继承于Object,所以所有的类都有这个方法。

2. HadoopUtils::toInt(const string & str); 总抛异常。试着修复了n边, 跑一次要一个小时的!!!faint!
总之,程序是要精确的, code review 很重要

Categories: Program Tags: ,

动态二维数组实现 C

July 1, 2010 Leave a comment

“定义一个二维指针的空间分配和释放 int **ptr”

1. 调用malloc 函数 次数: ( rows + 1)

 
int** New2DPointer(int m, int n)
{
if (m > 0 && n > 0)
{
typedef int * INT_POINTER;
int **ptr = NULL;
try {
ptr = new INT_POINTER[m];
for (int i = 0; i != m; ++i)
{
ptr[i] = new int[n];
}
}
catch (bad_alloc e)
{

std::cout << “Error allocating memory.” << std::endl;
}

return ptr;
}
else
{
std::cout << “invalided input parameters\n”;
}
}

void Delete2DPointer(int **p, int m)
{
for (int i = 0; i != m; ++i)
{
delete [] (p[m]);
}
}

调用malloc 函数 次数: ( 1 + 1)

int **array2d_new(size_t rows, size_t cols)
{
int **array2d, **end, **cur;
int *array;
cur = array2d = malloc(rows * sizeof(int *));
if (!array2d)
return NULL;

array = malloc(rows * cols * sizeof(int));
if (!array)
{
free(array2d);
return NULL;
}

end = array2d + rows;
while (cur != end)
{
*cur = array;
array += cols;
cur++;
}

//print_2d_array(m, rows, cols);
return array2d;
}

调用malloc 函数次数: ( 1 + 1), 看上去更牛B些

void ** array2d(size_t rows, size_t cols, size_t value_size)
{
size_t index_size = sizeof(void *) * rows;
size_t store_size = value_size * rows * cols;

char * a = (char*)malloc(index_size + store_size);
if(!a) return NULL;

memset(a + index_size, 0, store_size);
for(size_t i = 0; i < rows; ++i)
((void **)a)[i] = a + index_size + i * cols * value_size;

return (void **)a;
}

int printf(const char *, ...);

int main()
{
int ** a = (int **)array2d(5, 5, sizeof(int));
assert(a);

for (int i = 0; i < 5; i++)
for (int j = 0; j < 5; j++)
a[i][j] = i*j;
//a[4][3] = 42;
for (int i = 0; i < 5; i++)
{
for (int j = 0; j < 5; j++)
printf("%i\t", a[i][j]);
printf("\n");
}

free(a);
return 0;
}

    参考:

  1. http://stackoverflow.com/questions/455960/dynamic-allocating-array-of-arrays-in-c
  2. http://c-faq.com/aryptr/dynmuldimary.html
Categories: algorithm & DS Tags: ,

Summary[4]

July 1, 2010 Leave a comment

Brief summary these two week’s GSoC. Just having released the first BugStat version 1.0, I am considering adding new features and enhancing existing features. These need to be implemented in these weeks.

  1. Add more statistic information about users in Bugzilla, perhaps including CC List\ QA Field\Bug patches\ Bug Reviewers, these items have not been verified .
  2. All these statistic information may be grouped by Products in Bugzilla, and then display
  3. UI part needs to be improved.
Categories: GSoC2010 Tags:

Graph clustering algorithm

June 30, 2010 Leave a comment

首先这是个经典的问题,但是一些算法看了还是似懂非懂,基础差是问题所在。

我看到方法主要分为两类:

  1. Graph partitioning based on minimum cut or spectral partitioning. 简单说就是最小化不同簇之间的连接.   详细参考 http://www.cs.berkeley.edu/~demmel/cs267/lecture20/lecture20.html 这种方法存在一些不足,它一次只能二分,对于不知道能聚成多少的类的情况,如果采用多次二分方法,最小化簇之间的连接并不是个好的准则。  感觉二分的思想还是很精妙的,使用线性代数矩阵的知识,数学支撑还是很强。
  2. Modularity    基本思想是“There must be a smaller than expected number edges between communities”,
    Define modularity to be Q = (number of edges within groups) – (expected number within groups). 详细google ” Modularity and community structure in networks

实现这些算法感觉挺难的,幸好有些开源的包,比如matlab中就有、Graclus softwareGraph Clustering

这些算法都比较耗时,复杂度高,遇到大规模数据时就麻烦啦。

Categories: interesting reserch Tags:

Summary [3]

June 15, 2010 1 comment

Brief summary for last week, from 7 June to 13 June.

It is an exciting week for my GSoC.  I have released the first version of my Extension BUGSTATS, and it fulfills the basic functions expected.

Main Steps follow:

  • add a new hook “additional_user_data” located in bugzilla/template/en/default/global/ user.html.tmpl.
  • add a hypelink template for directing to personal stats page in BugStats/template/en/default/hook/global/user-additional- user-data.html.tmpl
  • as to personal stats page, use the existing hook “page_before_template”, this hook is often used for adding new-defined page. In BugStats/template/Extension.pm ‘s sub “page_before_template” pass stats information to stats page template, which located in BugStats/template/en/default/pages/stats/user.html.tmpl

Besides, during the coding I learn some basic about the Template Toolkit, and it is smart and useful.

Demo:bugstats

Categories: GSoC2010

Summary [2]

June 6, 2010 Leave a comment

summary for this week:

1. During reading Bugzilla source code, I come across some Perl language problem. THEN, Learning some advance Perl feature, including reference, DBI, template.

2. Since my task for GSoC is to create an Extension for the purpose of collecting Statistics to show about a user in Bugzilla. for the first thing, I need to understand how Bugzilla extension works, and then how to write a Bugzilla extension. In Bugzilla, I try to understand two extensions: /extensions/example; /extension/voting.

3. Besides, I begin to  try to write a “HelloWorld” extension,  which first collects some Stats including #comments, #fixedBug, #SubmitedBug.

I implement hook sub page_before_template:


sub page_before_template {
    my ($self, $args) = @_;
    my $page = $args->{page_id};
    my $vars = $args->{vars};

   if ($page =~ m{^stats/user\.}) {
       _page_user($vars);
    }
}

sub _page_user {
    my ($vars) = @_;
    my $dbh = Bugzilla->dbh;
    my $user = Bugzilla->user;
    my $input = Bugzilla->input_params;
    my $who_id = $input->{user_id} || $user->id;
    my $who = Bugzilla::User->check({ id => $who_id });

    # 
    my (@sql_statements, %all_bug_ids,@all_bug_cnts, $id, $sql_state);
    my @types= qw( #bugs_reported #bugs_assigned #comment #voting #cc #qa #patch );

    $sql_statements[0] = "SELECT bugs.bug_id FROM  bugs  WHERE bugs.reporter = ?";
    $sql_statements[1] = "SELECT bugs.bug_id FROM  bugs  WHERE bugs.assigned_to = ?";
    $sql_statements[2] = "SELECT DISTINCT longdescs.bug_id FROM longdescs  WHERE longdescs.who = ?";
    $sql_statements[3] = "SELECT votes.bug_id FROM votes  WHERE votes.who = ?";
    $sql_statements[4] = "SELECT cc.bug_id FROM  cc WHERE cc.who = ?";
    $sql_statements[5] = "SELECT bugs.bug_id FROM bugs WHERE bugs.qa_contact = ?";
    $sql_statements[6] = "SELECT attachments.bug_id FROM attachments WHERE attachments.submitter_id = ? AND attachments.ispatch = 1";
    
    for (my $index = 0; $index < @sql_statements; $index++){
        my $sth = $dbh->prepare($sql_statements[$index]);
        $sth->execute($who->id);
        my @bug_ids;
        while(($id) = $sth->fetchrow_array())
        {
            push (@bug_ids, $id);
        }
        $all_bug_ids{$types[$index]} = [@bug_ids];
    
        my $cnt = @bug_ids;
        push (@all_bug_cnts, $cnt);
        $sth->finish();
    }


    # Calculate Point for userid
    my $point = log($all_bug_cnts[0] + 1) + log($all_bug_cnts[1]+ 1)*2 + log($all_bug_cnts[2])/log(10) + log($all_bug_cnts[3]+ 1) + log($all_bug_cnts[4] + 1) + 3*log($all_bug_cnts[6] + 1);

    $vars->{'all_bugs'} = \%all_bug_ids;
    $vars->{'point'} = $point;
    $vars->{'user'} = $who;
    #$vars->{'types'} = ['#bugs_reported', '#bugs_assigned', '#comment', '#voting', '#cc', '#qa','#patch'];
}

ps: DBI: http://www.felixgers.de/teaching/perl/perl_DBI.html

http://template-toolkit.org/docs/tutorial/Web.html

Categories: GSoC2010, Perl Tags:
Follow

Get every new post delivered to your Inbox.