Refine bugzilla extension code

July 17, 2010 Leave a comment

After I host the souce code on the Github, my mentor Guy suggested me refine the code, and I finish this job now and commit code to Github

PS: most used git commands

  1. git add .
  2. git commit -m “your comments”
  3. git push origin master
Categories: GSoC2010 Tags:

最近一周很恶心的bug

July 9, 2010 Leave a comment

1. 有段时间没写过Java, 以前写一般是边写边学,没有google就没法写的那种。
遇到的问题是,HashMap的key 是一个自定义对象MyObject,要实现我需要的查找得自己重写equals(), hashCode()函数,这些我都知道的,但是还是出问题了,在重写equals()时,比较两个字符串用了 “==”, 应该用String的 equals()判断, 导致HashMap中没法找到。
那么“==”和“equals()”区别是什么呢?

String s1 = new String("str");
String s2 = new String("str");

如果用==号比较,会返回false,因为创建了两个对象,他们在内存中地址的位置是不一样的。
equals, 会返回true,它是java.lang.Object类中的一个方法。因为java中所有的类都默认继承于Object,所以所有的类都有这个方法。

2. HadoopUtils::toInt(const string & str); 总抛异常。试着修复了n边, 跑一次要一个小时的!!!faint!
总之,程序是要精确的, code review 很重要

Categories: Program Tags: ,

动态二维数组实现 C

July 1, 2010 Leave a comment

“定义一个二维指针的空间分配和释放 int **ptr”

1. 调用malloc 函数 次数: ( rows + 1)

 
int** New2DPointer(int m, int n)
{
if (m > 0 && n > 0)
{
typedef int * INT_POINTER;
int **ptr = NULL;
try {
ptr = new INT_POINTER[m];
for (int i = 0; i != m; ++i)
{
ptr[i] = new int[n];
}
}
catch (bad_alloc e)
{

std::cout << “Error allocating memory.” << std::endl;
}

return ptr;
}
else
{
std::cout << “invalided input parameters\n”;
}
}

void Delete2DPointer(int **p, int m)
{
for (int i = 0; i != m; ++i)
{
delete [] (p[m]);
}
}

调用malloc 函数 次数: ( 1 + 1)

int **array2d_new(size_t rows, size_t cols)
{
int **array2d, **end, **cur;
int *array;
cur = array2d = malloc(rows * sizeof(int *));
if (!array2d)
return NULL;

array = malloc(rows * cols * sizeof(int));
if (!array)
{
free(array2d);
return NULL;
}

end = array2d + rows;
while (cur != end)
{
*cur = array;
array += cols;
cur++;
}

//print_2d_array(m, rows, cols);
return array2d;
}

调用malloc 函数次数: ( 1 + 1), 看上去更牛B些

void ** array2d(size_t rows, size_t cols, size_t value_size)
{
size_t index_size = sizeof(void *) * rows;
size_t store_size = value_size * rows * cols;

char * a = (char*)malloc(index_size + store_size);
if(!a) return NULL;

memset(a + index_size, 0, store_size);
for(size_t i = 0; i < rows; ++i)
((void **)a)[i] = a + index_size + i * cols * value_size;

return (void **)a;
}

int printf(const char *, ...);

int main()
{
int ** a = (int **)array2d(5, 5, sizeof(int));
assert(a);

for (int i = 0; i < 5; i++)
for (int j = 0; j < 5; j++)
a[i][j] = i*j;
//a[4][3] = 42;
for (int i = 0; i < 5; i++)
{
for (int j = 0; j < 5; j++)
printf("%i\t", a[i][j]);
printf("\n");
}

free(a);
return 0;
}
    参考:

  1. http://stackoverflow.com/questions/455960/dynamic-allocating-array-of-arrays-in-c
  2. http://c-faq.com/aryptr/dynmuldimary.html
Categories: algorithm & DS Tags: ,

Summary[4]

July 1, 2010 Leave a comment

Brief summary these two week’s GSoC. Just having released the first BugStat version 1.0, I am considering adding new features and enhancing existing features. These need to be implemented in these weeks.

  1. Add more statistic information about users in Bugzilla, perhaps including CC List\ QA Field\Bug patches\ Bug Reviewers, these items have not been verified .
  2. All these statistic information may be grouped by Products in Bugzilla, and then display
  3. UI part needs to be improved.
Categories: GSoC2010 Tags:

Graph clustering algorithm

June 30, 2010 Leave a comment

首先这是个经典的问题,但是一些算法看了还是似懂非懂,基础差是问题所在。

我看到方法主要分为两类:

  1. Graph partitioning based on minimum cut or spectral partitioning. 简单说就是最小化不同簇之间的连接.   详细参考 http://www.cs.berkeley.edu/~demmel/cs267/lecture20/lecture20.html 这种方法存在一些不足,它一次只能二分,对于不知道能聚成多少的类的情况,如果采用多次二分方法,最小化簇之间的连接并不是个好的准则。  感觉二分的思想还是很精妙的,使用线性代数矩阵的知识,数学支撑还是很强。
  2. Modularity    基本思想是“There must be a smaller than expected number edges between communities”,
    Define modularity to be Q = (number of edges within groups) – (expected number within groups). 详细google ” Modularity and community structure in networks

实现这些算法感觉挺难的,幸好有些开源的包,比如matlab中就有、Graclus softwareGraph Clustering

这些算法都比较耗时,复杂度高,遇到大规模数据时就麻烦啦。

Categories: interesting reserch Tags:

Summary [3]

June 15, 2010 1 comment

Brief summary for last week, from 7 June to 13 June.

It is an exciting week for my GSoC.  I have released the first version of my Extension BUGSTATS, and it fulfills the basic functions expected.

Main Steps follow:

  • add a new hook “additional_user_data” located in bugzilla/template/en/default/global/ user.html.tmpl.
  • add a hypelink template for directing to personal stats page in BugStats/template/en/default/hook/global/user-additional- user-data.html.tmpl
  • as to personal stats page, use the existing hook “page_before_template”, this hook is often used for adding new-defined page. In BugStats/template/Extension.pm ‘s sub “page_before_template” pass stats information to stats page template, which located in BugStats/template/en/default/pages/stats/user.html.tmpl

Besides, during the coding I learn some basic about the Template Toolkit, and it is smart and useful.

Demo:bugstats

Categories: GSoC2010

Summary [2]

June 6, 2010 Leave a comment

summary for this week:

1. During reading Bugzilla source code, I come across some Perl language problem. THEN, Learning some advance Perl feature, including reference, DBI, template.

2. Since my task for GSoC is to create an Extension for the purpose of collecting Statistics to show about a user in Bugzilla. for the first thing, I need to understand how Bugzilla extension works, and then how to write a Bugzilla extension. In Bugzilla, I try to understand two extensions: /extensions/example; /extension/voting.

3. Besides, I begin to  try to write a “HelloWorld” extension,  which first collects some Stats including #comments, #fixedBug, #SubmitedBug.

I implement hook sub page_before_template:


sub page_before_template {
    my ($self, $args) = @_;
    my $page = $args->{page_id};
    my $vars = $args->{vars};

   if ($page =~ m{^stats/user\.}) {
       _page_user($vars);
    }
}

sub _page_user {
    my ($vars) = @_;
    my $dbh = Bugzilla->dbh;
    my $user = Bugzilla->user;
    my $input = Bugzilla->input_params;
    my $who_id = $input->{user_id} || $user->id;
    my $who = Bugzilla::User->check({ id => $who_id });

    # 
    my (@sql_statements, %all_bug_ids,@all_bug_cnts, $id, $sql_state);
    my @types= qw( #bugs_reported #bugs_assigned #comment #voting #cc #qa #patch );

    $sql_statements[0] = "SELECT bugs.bug_id FROM  bugs  WHERE bugs.reporter = ?";
    $sql_statements[1] = "SELECT bugs.bug_id FROM  bugs  WHERE bugs.assigned_to = ?";
    $sql_statements[2] = "SELECT DISTINCT longdescs.bug_id FROM longdescs  WHERE longdescs.who = ?";
    $sql_statements[3] = "SELECT votes.bug_id FROM votes  WHERE votes.who = ?";
    $sql_statements[4] = "SELECT cc.bug_id FROM  cc WHERE cc.who = ?";
    $sql_statements[5] = "SELECT bugs.bug_id FROM bugs WHERE bugs.qa_contact = ?";
    $sql_statements[6] = "SELECT attachments.bug_id FROM attachments WHERE attachments.submitter_id = ? AND attachments.ispatch = 1";
    
    for (my $index = 0; $index < @sql_statements; $index++){
        my $sth = $dbh->prepare($sql_statements[$index]);
        $sth->execute($who->id);
        my @bug_ids;
        while(($id) = $sth->fetchrow_array())
        {
            push (@bug_ids, $id);
        }
        $all_bug_ids{$types[$index]} = [@bug_ids];
    
        my $cnt = @bug_ids;
        push (@all_bug_cnts, $cnt);
        $sth->finish();
    }


    # Calculate Point for userid
    my $point = log($all_bug_cnts[0] + 1) + log($all_bug_cnts[1]+ 1)*2 + log($all_bug_cnts[2])/log(10) + log($all_bug_cnts[3]+ 1) + log($all_bug_cnts[4] + 1) + 3*log($all_bug_cnts[6] + 1);

    $vars->{'all_bugs'} = \%all_bug_ids;
    $vars->{'point'} = $point;
    $vars->{'user'} = $who;
    #$vars->{'types'} = ['#bugs_reported', '#bugs_assigned', '#comment', '#voting', '#cc', '#qa','#patch'];
}

ps: DBI: http://www.felixgers.de/teaching/perl/perl_DBI.html

http://template-toolkit.org/docs/tutorial/Web.html

Categories: GSoC2010, Perl Tags:

Summary [1]

May 28, 2010 Leave a comment

After communicating with my mentor Guy.Pyrzak, thanks for his advices.  I make a brief  summary for past three week’s work.

1.  Having Learnt Perl for several weeks, and i can use Perl to write some program.

2.  After talking with Guy, I have the following three main steps for the Social Extension for Bugzilla.

3.  I have checked out Bugzilla, and just want to begin to read some related module source code.

Categories: GSoC2010 Tags: , ,

Google Summer of Code 2010 proposal for Mozilla Bugzilla

May 28, 2010 Leave a comment

Main body of my Project Proposal

Bugzilla is an issue tracking system, and its users are the people who submit bugs or requirements and the people who fix these issue. Obviously there are lots of people involved in Bugzilla for the development of software system. When looking at bug reports and comments, people often want to konw “who is this guy”. “what patches they have submitted””how active they are in the project”,etc. In this GSoC project, we want to implement an extension which collect and show these statistics with the help of the mentor, continuing working in this project and submit my code and test cases.

Several main tasks need to be done.

First, though I have used Bugzilla for several software projects, and read some Bugzilla Help Documents, I have to read some Bugzilla’s source code and related specifications.

Second, as far as I know, in order to implement this extension, several skills are required including perl, mysql, css and Javascript and so on. Frankly speaking, I am not familiar with Perl  language, but I have used C/C++ for 3 years Java for 2years and python for half year, and I have a good computer science background. Therefore I have to pick up the languages as quickly as possible.

Third, as to the detail of extension, what statistics information will be collected, such as past or recently activities on projects or components or packages, activity information about users is collected which indicates how actively they are involved in the software development.  I will discuss these issues with the mentor. I maintain that the extension may be including many parts: collect and show statistics information per User ID; the social ranking system such as bugzilla.gone.org’s “Points”, which roughly reflects how actively the guy is. The current formula for “points”  is:

log_10(1 + #comments) + log_2(1 + #bugs_closed) + log_2(1 + #bugs_reported

we need to consider more information for this “points” formulate; Since it is a  social work for users to use Bugzilla for work, we may copy twitter’s pattern to follow some guys and build a network.

Lastly, Coding and implementing the extension.

Schedule of Deliverables

The whole available time of GSoC is about 16 weeks

  1. Week 1 – Week 2 (2 weeks): Get to know the community; studying the spec; discussing with the mentor
  2. Week 3 – Week 5 (3 weeks): learn/become familiar with current code base and  some languages (perl, css),
  3. Week 7 (1 week): discussions with the mentor and community on its interpretation and how it might best be implemented extension.
  4. Week 8 – Week 12 (5 weeks): coding and testing
  5. Week 13 – Week 14(2 weeks): Tidying up any loose ends, fixing some bugs, and cleaning the source code.
  6. Week 15 – Week 16(2 weeks): summary the project and ensuring the code is integrated and made available on Google Code and in the Mozilla repositories.

About me:

I graduating with a B.S. degree in Information System from Beijing Normal University, China, in 2008, rank 10/50, Now I am pursing a master degree in Institute of Software Chinese Academy of Science, and my current research interests including using Data mining and machine learning technology to analysis software engineering (SE) data and support SE.

I have interned in Search Engine Department of Tencent Inc. (the largest Internet company in China, also the third over the world) from 2010.1 to 2010.4.

Categories: GSoC2010 Tags:

Hello world!

May 28, 2010 1 comment

Wenjinwu’s Blog comes

Categories: Uncategorized