How to Get the Number of Commits and Lines of Code in Pull Requests

February 4, 2021
#How To#Bitbucket#Reporting
9 min

According to the research conducted by the Cisco Systems programming team, where they tried to determine the best practices for code review, they found out that the pull request size should not include more than 200 to 400 lines of code. Keeping the size of your pull requests within these limits not only will speed up the review but also this amount of information is optimal for the brain to process effectively at a time.

In case you’d like to analyze your current database, counting lines of code manually for each pull request could take years, so we suggest automating this process with the help of Awesome Graphs for Bitbucket and Python. This article will show you how you can build a report with pull request size statistics in terms of lines of code and commits on the repository level.

What you will get

As a result, you’ll get a CSV file containing a detailed list of pull requests created during the specified period with the number of commits, lines of code added and deleted in them.

How to get it

To get the report described above, we’ll run the following script that will make requests into the REST API, and do all the calculations and aggregation for us.

import requests
import csv
import sys

bitbucket_url = sys.argv[1]
login = sys.argv[2]
password = sys.argv[3]
project = sys.argv[4]
repository = sys.argv[5]
since = sys.argv[6]
until = sys.argv[7]

get_prs_url = bitbucket_url + '/rest/awesome-graphs-api/latest/projects/' + project + '/repos/' + repository \
            + '/pull-requests'

s = requests.Session()
s.auth = (login, password)


class PullRequest:

    def __init__(self, title, pr_id, author, created, closed):
        self.title = title
        self.pr_id = pr_id
        self.author = author
        self.created = created
        self.closed = closed


class PullRequestWithCommits:

    def __init__(self, title, pr_id, author, created, closed, commits, loc_added, loc_deleted):
        self.title = title
        self.pr_id = pr_id
        self.author = author
        self.created = created
        self.closed = closed
        self.commits = commits
        self.loc_added = loc_added
        self.loc_deleted = loc_deleted


def get_pull_requests():

    pull_request_list = []

    is_last_page = False

    while not is_last_page:

        response = s.get(get_prs_url, params={'start': len(pull_request_list), 'limit': 1000,
                                      'sinceDate': since, 'untilDate': until}).json()

        for pr_details in response['values']:

            title = pr_details['title']
            pd_id = pr_details['id']
            author = pr_details['author']['user']['emailAddress']
            created = pr_details['createdDate']
            closed = pr_details['closedDate']

            pull_request_list.append(PullRequest(title, pd_id, author, created, closed))

        is_last_page = response['isLastPage']

    return pull_request_list


def get_commit_statistics(pull_request_list):

    pr_list_with_commits = []

    for pull_request in pull_request_list:

        print('Processing Pull Request', pull_request.pr_id)

        commit_ids = []

        is_last_page = False

        while not is_last_page:

            url = bitbucket_url + '/rest/api/latest/projects/' + project + '/repos/' + repository \
                + '/pull-requests/' + str(pull_request.pr_id) + '/commits'
            response = s.get(url, params={'start': len(commit_ids), 'limit': 25}).json()

            for commit in response['values']:
                commit_ids.append(commit['id'])

            is_last_page = response['isLastPage']

        commits = 0
        loc_added = 0
        loc_deleted = 0

        for commit_id in commit_ids:

            commits += 1

            url = bitbucket_url + '/rest/awesome-graphs-api/latest/projects/' + project + '/repos/' + repository \
                + '/commits/' + commit_id
            response = s.get(url).json()

            if 'errors' not in response:
                loc_added += response['linesOfCode']['added']
                loc_deleted += response['linesOfCode']['deleted']
            else:
                pass

        pr_list_with_commits.append(PullRequestWithCommits(pull_request.title, pull_request.pr_id, pull_request.author,
                                                           pull_request.created, pull_request.closed, commits,
                                                           loc_added, loc_deleted))

    return pr_list_with_commits


with open('{}_{}_pr_size_stats_{}_{}.csv'.format(project, repository, since, until), mode='a', newline='') as report_file:

    report_writer = csv.writer(report_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    report_writer.writerow(['title', 'id', 'author', 'created', 'closed', 'commits', 'loc_added', 'loc_deleted'])

    for pr in get_commit_statistics(get_pull_requests()):
        report_writer.writerow([pr.title, pr.pr_id, pr.author, pr.created, pr.closed, pr.commits, pr.loc_added, pr.loc_deleted])

print('The resulting CSV file is saved to the current folder.')

To make this script work, you’ll need to install the requests module in advance, the csv and sys modules are available in Python out of the box. Then you need to pass seven arguments to the script when executed: the URL of your Bitbucket, login, password, project key, repository name, since date, until date. Here’s an example:

py script.py https://bitbucket.your-company-name.com login password PRKEY repo-name 2020-11-31 2021-02-01

As you’ll see at the end of the execution, the resulting file will be saved to the same folder next to the script.

Want more?

The Awesome Graphs for Bitbucket app and its REST API, in particular, allow you to get much more than described here, and we want to help you to get the most of it. If you have an idea in mind or a problem that you’d like us to solve, write here in the comments or create a request in our Help Center, and we’ll cover it in future posts! In fact, the idea for this very article was brought to us by our customers, so there is a high chance that your case will be the next one. 

Here are a few how-tos that you can read right now:

Related posts

    How to count lines of code in Bitbucket to decide what SonarQube license you need

    October 29, 2020
    #Bitbucket#Reporting#How To
    10 min

    SonarQube is a tool used to identify software metrics and technical debt in the source code through static analysis. While the Community Edition is free and open-source, the Developer, Enterprise, and Data Center editions are priced per instance per year and based on the number of lines of code (LOC). If you want to buy a license for SonarQube, you need to count lines of code for Bitbucket projects and repositories you want to analyze. 

    Awesome Graphs for Bitbucket offers you different ways of getting this information in the Data Center and Server versions. In this post, we’ll show how you can count LOC for your Bitbucket instance, projects, or repositories, using the Awesome Graphs’ REST API resources and Python.

    How to count lines of code for the whole Bitbucket instance

    Getting lines of code statistics for an instance is pretty straightforward and will only require making one call to the REST API. Here is an example of the curl command:

    curl -X GET -u username:password "https://bitbucket.your-company-name.com/rest/awesome-graphs-api/latest/commits/statistics"

    And the response will look like this:

    {
        "linesOfCode":{
            "added":5958278,
            "deleted":2970874
        },
        "commits":57595
    }
    

    It returns the number of lines added and deleted. So, to get the total, you’ll simply need to subtract the number of deleted from the added.

    Please note that blank lines are also counted in lines of code statistics in this and the following cases.

    How to count lines of code for each project in the instance

    You can also use the REST API resource to get the LOC for a particular project, but doing this for each project in your instance will definitely take a while. That’s why we are going to automate this process with a simple Python script that will run through all of your projects, count the total LOC for each one, and then will save the list of project keys with their total LOC to a CSV file.

    The resulting CSV will look like this:

    And here is the script to get it:

    import requests
    import csv
    import sys
    
    bitbucket_url = sys.argv[1]
    bb_api_url = bitbucket_url + '/rest/api/latest'
    ag_api_url = bitbucket_url + '/rest/awesome-graphs-api/latest'
    
    s = requests.Session()
    s.auth = (sys.argv[2], sys.argv[3])
    
    def get_project_keys():
    
        projects = list()
    
        is_last_page = False
    
        while not is_last_page:
            request_url = bb_api_url + '/projects'
            response = s.get(request_url, params={'start': len(projects), 'limit': 25}).json()
    
            for project in response['values']:
                projects.append(project['key'])
            is_last_page = response['isLastPage']
    
        return projects
    
    def get_total_loc(project_key):
    
        url = ag_api_url + '/projects/' + project_key + '/commits/statistics'
        response = s.get(url).json()
        total_loc = response['linesOfCode']['added'] - response['linesOfCode']['deleted']
    
        return total_loc
    
    
    with open('total_loc_per_project.csv', mode='a', newline='') as report_file:
    
        report_writer = csv.writer(report_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        report_writer.writerow(['project_key', 'total_loc'])
    
        for project_key in get_project_keys():
            print('Processing project', project_key)
            report_writer.writerow([project_key, get_total_loc(project_key)])
    

    To make this script work, you’ll need to install the requests in advance, the csv and sys modules are available in Python out of the box. You need to pass three arguments to the script when executed: the URL of your Bitbucket, login, password. Here’s an example:

    py script.py https://bitbucket.your-company-name.com login password

    How to count lines of code for each repository in the project

    This case is very similar to the previous one, but this script will get the total LOC for each repository in the specified project. Here, the resulting CSV file will include the list of repo slugs in the specified project and their LOC totals:

    Counting lines of code for each repository in Bitbucket

    The script:

    import requests
    import csv
    import sys
    
    bitbucket_url = sys.argv[1]
    bb_api_url = bitbucket_url + '/rest/api/latest'
    ag_api_url = bitbucket_url + '/rest/awesome-graphs-api/latest'
    
    s = requests.Session()
    s.auth = (sys.argv[2], sys.argv[3])
    
    project_key = sys.argv[4]
    
    
    def get_repos(project_key):
        
        repos = list()
    
        is_last_page = False
    
        while not is_last_page:
            request_url = bb_api_url + '/projects/' + project_key + '/repos'
            response = s.get(request_url, params={'start': len(repos), 'limit': 25}).json()
            for repo in response['values']:
                repos.append(repo['slug'])
            is_last_page =  response['isLastPage']
    
        return repos
    
    
    def get_total_loc(repo_slug):
    
        url = ag_api_url + '/projects/' + project_key + \
              '/repos/' + repo_slug + '/commits/statistics'
        response = s.get(url).json()
        total_loc = response['linesOfCode']['added'] - response['linesOfCode']['deleted']
    
        return total_loc
    
    
    with open('total_loc_per_repo.csv', mode='a', newline='') as report_file:
        report_writer = csv.writer(report_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        report_writer.writerow(['repo_slug', 'total_loc'])
    
        for repo_slug in get_repos(project_key):
            print('Processing repository', repo_slug)
            report_writer.writerow([repo_slug, get_total_loc(repo_slug)])
    

    You need to pass the URL of your Bitbucket, login, password, project key, which will look as follows:

    py script.py https://bitbucket.your-company-name.com login password PROJECTKEY

    Want to learn more?

    We should note that the total LOC we get in each case shows the number of lines added minus lines deleted for all branches. Due to these peculiarities, some repos may have negative LOC numbers, so it might be useful to look at the LOC for a default branch and compare it to the LOC for all branches.

    If you would like to learn how to get this information, write here in the comments or create a request in our Help Center, and we’ll cover it in future posts!

    You can also check how to search for commits in Bitbucket, read our blog post that suggests three different ways of how you can do this.

    Related posts