Refine bugzilla extension code
最近一周很恶心的bug
1. 有段时间没写过Java, 以前写一般是边写边学,没有google就没法写的那种。
遇到的问题是,HashMap的key 是一个自定义对象MyObject,要实现我需要的查找得自己重写equals(), hashCode()函数,这些我都知道的,但是还是出问题了,在重写equals()时,比较两个字符串用了 “==”, 应该用String的 equals()判断, 导致HashMap中没法找到。
那么“==”和“equals()”区别是什么呢?
String s1 = new String("str"); String s2 = new String("str");
如果用==号比较,会返回false,因为创建了两个对象,他们在内存中地址的位置是不一样的。
equals, 会返回true,它是java.lang.Object类中的一个方法。因为java中所有的类都默认继承于Object,所以所有的类都有这个方法。
2. HadoopUtils::toInt(const string & str); 总抛异常。试着修复了n边, 跑一次要一个小时的!!!faint!
总之,程序是要精确的, code review 很重要
动态二维数组实现 C
“定义一个二维指针的空间分配和释放 int **ptr”
1. 调用malloc 函数 次数: ( rows + 1)
int** New2DPointer(int m, int n) { if (m > 0 && n > 0) { typedef int * INT_POINTER; int **ptr = NULL; try { ptr = new INT_POINTER[m]; for (int i = 0; i != m; ++i) { ptr[i] = new int[n]; } } catch (bad_alloc e) { std::cout << “Error allocating memory.” << std::endl; } return ptr; } else { std::cout << “invalided input parameters\n”; } } void Delete2DPointer(int **p, int m) { for (int i = 0; i != m; ++i) { delete [] (p[m]); } }
调用malloc 函数 次数: ( 1 + 1)
int **array2d_new(size_t rows, size_t cols) { int **array2d, **end, **cur; int *array; cur = array2d = malloc(rows * sizeof(int *)); if (!array2d) return NULL; array = malloc(rows * cols * sizeof(int)); if (!array) { free(array2d); return NULL; } end = array2d + rows; while (cur != end) { *cur = array; array += cols; cur++; } //print_2d_array(m, rows, cols); return array2d; }
调用malloc 函数次数: ( 1 + 1), 看上去更牛B些
void ** array2d(size_t rows, size_t cols, size_t value_size) { size_t index_size = sizeof(void *) * rows; size_t store_size = value_size * rows * cols; char * a = (char*)malloc(index_size + store_size); if(!a) return NULL; memset(a + index_size, 0, store_size); for(size_t i = 0; i < rows; ++i) ((void **)a)[i] = a + index_size + i * cols * value_size; return (void **)a; } int printf(const char *, ...); int main() { int ** a = (int **)array2d(5, 5, sizeof(int)); assert(a); for (int i = 0; i < 5; i++) for (int j = 0; j < 5; j++) a[i][j] = i*j; //a[4][3] = 42; for (int i = 0; i < 5; i++) { for (int j = 0; j < 5; j++) printf("%i\t", a[i][j]); printf("\n"); } free(a); return 0; }
Summary[4]
Brief summary these two week’s GSoC. Just having released the first BugStat version 1.0, I am considering adding new features and enhancing existing features. These need to be implemented in these weeks.
- Add more statistic information about users in Bugzilla, perhaps including CC List\ QA Field\Bug patches\ Bug Reviewers, these items have not been verified .
- All these statistic information may be grouped by Products in Bugzilla, and then display
- UI part needs to be improved.
Graph clustering algorithm
首先这是个经典的问题,但是一些算法看了还是似懂非懂,基础差是问题所在。
我看到方法主要分为两类:
- Graph partitioning based on minimum cut or spectral partitioning. 简单说就是最小化不同簇之间的连接. 详细参考 http://www.cs.berkeley.edu/~demmel/cs267/lecture20/lecture20.html 这种方法存在一些不足,它一次只能二分,对于不知道能聚成多少的类的情况,如果采用多次二分方法,最小化簇之间的连接并不是个好的准则。 感觉二分的思想还是很精妙的,使用线性代数矩阵的知识,数学支撑还是很强。
- Modularity 基本思想是“There must be a smaller than expected number edges between communities”,
Define modularity to be Q = (number of edges within groups) – (expected number within groups). 详细google ” Modularity and community structure in networks“
实现这些算法感觉挺难的,幸好有些开源的包,比如matlab中就有、Graclus software、Graph Clustering
这些算法都比较耗时,复杂度高,遇到大规模数据时就麻烦啦。
Summary [3]
Brief summary for last week, from 7 June to 13 June.
It is an exciting week for my GSoC. I have released the first version of my Extension BUGSTATS, and it fulfills the basic functions expected.
Main Steps follow:
- add a new hook “additional_user_data” located in bugzilla/template/en/default/global/ user.html.tmpl.
- add a hypelink template for directing to personal stats page in BugStats/template/en/default/hook/global/user-additional- user-data.html.tmpl
- as to personal stats page, use the existing hook “page_before_template”, this hook is often used for adding new-defined page. In BugStats/template/Extension.pm ‘s sub “page_before_template” pass stats information to stats page template, which located in BugStats/template/en/default/pages/stats/user.html.tmpl
Besides, during the coding I learn some basic about the Template Toolkit, and it is smart and useful.
Demo:bugstats
Summary [2]
summary for this week:
1. During reading Bugzilla source code, I come across some Perl language problem. THEN, Learning some advance Perl feature, including reference, DBI, template.
2. Since my task for GSoC is to create an Extension for the purpose of collecting Statistics to show about a user in Bugzilla. for the first thing, I need to understand how Bugzilla extension works, and then how to write a Bugzilla extension. In Bugzilla, I try to understand two extensions: /extensions/example; /extension/voting.
3. Besides, I begin to try to write a “HelloWorld” extension, which first collects some Stats including #comments, #fixedBug, #SubmitedBug.
I implement hook sub page_before_template:
sub page_before_template { my ($self, $args) = @_; my $page = $args->{page_id}; my $vars = $args->{vars}; if ($page =~ m{^stats/user\.}) { _page_user($vars); } } sub _page_user { my ($vars) = @_; my $dbh = Bugzilla->dbh; my $user = Bugzilla->user; my $input = Bugzilla->input_params; my $who_id = $input->{user_id} || $user->id; my $who = Bugzilla::User->check({ id => $who_id }); # my (@sql_statements, %all_bug_ids,@all_bug_cnts, $id, $sql_state); my @types= qw( #bugs_reported #bugs_assigned #comment #voting #cc #qa #patch ); $sql_statements[0] = "SELECT bugs.bug_id FROM bugs WHERE bugs.reporter = ?"; $sql_statements[1] = "SELECT bugs.bug_id FROM bugs WHERE bugs.assigned_to = ?"; $sql_statements[2] = "SELECT DISTINCT longdescs.bug_id FROM longdescs WHERE longdescs.who = ?"; $sql_statements[3] = "SELECT votes.bug_id FROM votes WHERE votes.who = ?"; $sql_statements[4] = "SELECT cc.bug_id FROM cc WHERE cc.who = ?"; $sql_statements[5] = "SELECT bugs.bug_id FROM bugs WHERE bugs.qa_contact = ?"; $sql_statements[6] = "SELECT attachments.bug_id FROM attachments WHERE attachments.submitter_id = ? AND attachments.ispatch = 1"; for (my $index = 0; $index < @sql_statements; $index++){ my $sth = $dbh->prepare($sql_statements[$index]); $sth->execute($who->id); my @bug_ids; while(($id) = $sth->fetchrow_array()) { push (@bug_ids, $id); } $all_bug_ids{$types[$index]} = [@bug_ids]; my $cnt = @bug_ids; push (@all_bug_cnts, $cnt); $sth->finish(); } # Calculate Point for userid my $point = log($all_bug_cnts[0] + 1) + log($all_bug_cnts[1]+ 1)*2 + log($all_bug_cnts[2])/log(10) + log($all_bug_cnts[3]+ 1) + log($all_bug_cnts[4] + 1) + 3*log($all_bug_cnts[6] + 1); $vars->{'all_bugs'} = \%all_bug_ids; $vars->{'point'} = $point; $vars->{'user'} = $who; #$vars->{'types'} = ['#bugs_reported', '#bugs_assigned', '#comment', '#voting', '#cc', '#qa','#patch']; }
ps: DBI: http://www.felixgers.de/teaching/perl/perl_DBI.html
http://template-toolkit.org/docs/tutorial/Web.html
Summary [1]
After communicating with my mentor Guy.Pyrzak, thanks for his advices. I make a brief summary for past three week’s work.
1. Having Learnt Perl for several weeks, and i can use Perl to write some program.
2. After talking with Guy, I have the following three main steps for the Social Extension for Bugzilla.
- I need to choose what stats about people needed? some suggestions are described by Guy, http://guy-pyrzak.blogspot.com/2010/03/ideas-for-user-stats-to-gether-for-more.html
- I need to consider how to gather , calculate and store these stats.
- I need to consider how to display them.
3. I have checked out Bugzilla, and just want to begin to read some related module source code.
Google Summer of Code 2010 proposal for Mozilla Bugzilla
Main body of my Project Proposal
Bugzilla is an issue tracking system, and its users are the people who submit bugs or requirements and the people who fix these issue. Obviously there are lots of people involved in Bugzilla for the development of software system. When looking at bug reports and comments, people often want to konw “who is this guy”. “what patches they have submitted””how active they are in the project”,etc. In this GSoC project, we want to implement an extension which collect and show these statistics with the help of the mentor, continuing working in this project and submit my code and test cases.
Several main tasks need to be done.
First, though I have used Bugzilla for several software projects, and read some Bugzilla Help Documents, I have to read some Bugzilla’s source code and related specifications.
Second, as far as I know, in order to implement this extension, several skills are required including perl, mysql, css and Javascript and so on. Frankly speaking, I am not familiar with Perl language, but I have used C/C++ for 3 years Java for 2years and python for half year, and I have a good computer science background. Therefore I have to pick up the languages as quickly as possible.
Third, as to the detail of extension, what statistics information will be collected, such as past or recently activities on projects or components or packages, activity information about users is collected which indicates how actively they are involved in the software development. I will discuss these issues with the mentor. I maintain that the extension may be including many parts: collect and show statistics information per User ID; the social ranking system such as bugzilla.gone.org’s “Points”, which roughly reflects how actively the guy is. The current formula for “points” is:
log_10(1 + #comments) + log_2(1 + #bugs_closed) + log_2(1 + #bugs_reported
we need to consider more information for this “points” formulate; Since it is a social work for users to use Bugzilla for work, we may copy twitter’s pattern to follow some guys and build a network.
Lastly, Coding and implementing the extension.
Schedule of Deliverables
The whole available time of GSoC is about 16 weeks
- Week 1 – Week 2 (2 weeks): Get to know the community; studying the spec; discussing with the mentor
- Week 3 – Week 5 (3 weeks): learn/become familiar with current code base and some languages (perl, css),
- Week 7 (1 week): discussions with the mentor and community on its interpretation and how it might best be implemented extension.
- Week 8 – Week 12 (5 weeks): coding and testing
- Week 13 – Week 14(2 weeks): Tidying up any loose ends, fixing some bugs, and cleaning the source code.
- Week 15 – Week 16(2 weeks): summary the project and ensuring the code is integrated and made available on Google Code and in the Mozilla repositories.
About me:
I graduating with a B.S. degree in Information System from Beijing Normal University, China, in 2008, rank 10/50, Now I am pursing a master degree in Institute of Software Chinese Academy of Science, and my current research interests including using Data mining and machine learning technology to analysis software engineering (SE) data and support SE.
I have interned in Search Engine Department of Tencent Inc. (the largest Internet company in China, also the third over the world) from 2010.1 to 2010.4.