How to Get the Number of Commits and Lines of Code in Pull Requests

February 4, 2021
#How To#Bitbucket#Reporting
9 min

According to the research conducted by the Cisco Systems programming team, where they tried to determine the best practices for code review, they found out that the pull request size should not include more than 200 to 400 lines of code. Keeping the size of your pull requests within these limits not only will speed up the review but also this amount of information is optimal for the brain to process effectively at a time.

In case you’d like to analyze your current database, counting lines of code manually for each pull request could take years, so we suggest automating this process with the help of Awesome Graphs for Bitbucket and Python. This article will show you how you can build a report with pull request size statistics in terms of lines of code and commits on the repository level.

What you will get

As a result, you’ll get a CSV file containing a detailed list of pull requests created during the specified period with the number of commits, lines of code added and deleted in them.

How to get it

To get the report described above, we’ll run the following script that will make requests into the REST API, and do all the calculations and aggregation for us.

import requests
import csv
import sys

bitbucket_url = sys.argv[1]
login = sys.argv[2]
password = sys.argv[3]
project = sys.argv[4]
repository = sys.argv[5]
since = sys.argv[6]
until = sys.argv[7]

get_prs_url = bitbucket_url + '/rest/awesome-graphs-api/latest/projects/' + project + '/repos/' + repository \
            + '/pull-requests'

s = requests.Session()
s.auth = (login, password)

class PullRequest:

    def __init__(self, title, pr_id, author, created, closed):
        self.title = title
        self.pr_id = pr_id = author
        self.created = created
        self.closed = closed

class PullRequestWithCommits:

    def __init__(self, title, pr_id, author, created, closed, commits, loc_added, loc_deleted):
        self.title = title
        self.pr_id = pr_id = author
        self.created = created
        self.closed = closed
        self.commits = commits
        self.loc_added = loc_added
        self.loc_deleted = loc_deleted

def get_pull_requests():

    pull_request_list = []

    is_last_page = False

    while not is_last_page:

        response = s.get(get_prs_url, params={'start': len(pull_request_list), 'limit': 1000,
                                      'sinceDate': since, 'untilDate': until}).json()

        for pr_details in response['values']:

            title = pr_details['title']
            pd_id = pr_details['id']
            author = pr_details['author']['user']['emailAddress']
            created = pr_details['createdDate']
            closed = pr_details['closedDate']

            pull_request_list.append(PullRequest(title, pd_id, author, created, closed))

        is_last_page = response['isLastPage']

    return pull_request_list

def get_commit_statistics(pull_request_list):

    pr_list_with_commits = []

    for pull_request in pull_request_list:

        print('Processing Pull Request', pull_request.pr_id)

        commit_ids = []

        is_last_page = False

        while not is_last_page:

            url = bitbucket_url + '/rest/api/latest/projects/' + project + '/repos/' + repository \
                + '/pull-requests/' + str(pull_request.pr_id) + '/commits'
            response = s.get(url, params={'start': len(commit_ids), 'limit': 25}).json()

            for commit in response['values']:

            is_last_page = response['isLastPage']

        commits = 0
        loc_added = 0
        loc_deleted = 0

        for commit_id in commit_ids:

            commits += 1

            url = bitbucket_url + '/rest/awesome-graphs-api/latest/projects/' + project + '/repos/' + repository \
                + '/commits/' + commit_id
            response = s.get(url).json()

            if 'errors' not in response:
                loc_added += response['linesOfCode']['added']
                loc_deleted += response['linesOfCode']['deleted']

        pr_list_with_commits.append(PullRequestWithCommits(pull_request.title, pull_request.pr_id,,
                                                           pull_request.created, pull_request.closed, commits,
                                                           loc_added, loc_deleted))

    return pr_list_with_commits

with open('{}_{}_pr_size_stats_{}_{}.csv'.format(project, repository, since, until), mode='a', newline='') as report_file:

    report_writer = csv.writer(report_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    report_writer.writerow(['title', 'id', 'author', 'created', 'closed', 'commits', 'loc_added', 'loc_deleted'])

    for pr in get_commit_statistics(get_pull_requests()):
        report_writer.writerow([pr.title, pr.pr_id,, pr.created, pr.closed, pr.commits, pr.loc_added, pr.loc_deleted])

print('The resulting CSV file is saved to the current folder.')

To make this script work, you’ll need to install the requests module in advance, the csv and sys modules are available in Python out of the box. Then you need to pass seven arguments to the script when executed: the URL of your Bitbucket, login, password, project key, repository name, since date, until date. Here’s an example:

py login password PRKEY repo-name 2020-11-31 2021-02-01

As you’ll see at the end of the execution, the resulting file will be saved to the same folder next to the script.

Want more?

The Awesome Graphs for Bitbucket app and its REST API, in particular, allow you to get much more than described here, and we want to help you to get the most of it. If you have an idea in mind or a problem that you’d like us to solve, write here in the comments or create a request in our Help Center, and we’ll cover it in future posts! In fact, the idea for this very article was brought to us by our customers, so there is a high chance that your case will be the next one. 

Here are a few how-tos that you can read right now: