Fig. 3. Excerpt of Pooka SVN Log.3.6.3 CVS/SVN Commit MessagesFig. 3 shows an excerpt of a commit of Pooka. There are 3,762, 1,743, 3,261, and 8,079 SVN commits for jEdit, Pooka, Rhino, and SIP, respectively. We performed the data preprocessing steps described in Section 2.2.1 on all SVN commits with the help of FacTrace.After performing the preprocessing steps, we obtained 2,911, 1,393, 2,508, and 5,188 SVN commits for jEdit, Pooka, Rhino, and SIP, respectively. There were many SVN commits that did not concern source code files. Also, some commit messages contained both source code files and other files. For example, revision 1604 in Pooka points only to HTML files except for one Java file, FolderInternalFrame.java. Therefore, we only kept the Java file and removed any reference to the HTML files. We stored all filtered SVN commit messages and related files in a FacTrace database.3.6.4 Bug ReportsWe cannot use jEdit [24] and Pooka bug reports because the first system does not have a publicly available bug repository and the second one has too few recorded bugs (16). Rhino is part of the Mozilla browser and its bug reports are available via the Mozilla Bugzilla bug tracker. We extracted all 770 bugs reported against Rhino and used Histrace to link them with the CVS repository as described in Section 2.2. Histrace automatically linked 457 of the bug reports to their respective commits. In the case of SIP, we downloaded 413 bug reports. SIP developers did not follow any rule while fixing bugs to link bug reports and commits. Hence, there was no bug ID in the commit messages. However, developers referenced SVN revision numbers in the bug reports’ comments, e.g., bug ID 237 contains the revision ID r4550. We tuned the regular expression of Histrace to find the revision IDs in the descriptions of the SIP bug reports. Histrace thus extracted all the bug IDs and linked them to SVN commits. Overall, Histrace automatically linked 169 bugs reported against SIP to their respective commits.3.6.5 Last Preprocessing StepWe automatically extracted all the identifiers from the jEdit, Pooka, Rhino, and SIP requirements, source code, filtered CSV/SVN commit messages, and filtered bug reports, using FacTrace. The output of this step is four corpora that we use for creating traceability links, as explained in Sections 2.2 and 2.3.
đang được dịch, vui lòng đợi..