How to Export Commit and Pull Request Data from Bitbucket to CSV

November 26, 2020
#Bitbucket#Reporting#How To
11 min

Being a universal file type, CSV serves as a go-to format for integrations between the applications. It allows for transferring a large amount of data across the systems, even if the integration is not supported natively.  However, you can’t export commit and pull request data from Bitbucket out of the box. The good news is that Awesome Graphs for Bitbucket gives you the capability to export to CSV in different ways.

In this article, we’ll show you how you can use the app to export engineering data to CSV for further integration, organization, and processing in analytics tools and custom solutions.

What you will get

The described ways of exporting will give you two kinds of generated CSV files, depending on the type of data exported. 

In the case of commit data, you’ll get a list of commits with their details:

list of commits with details

And the resulting CSV with a list of pull requests will look like this:

export commit and pull request data to csv

Exporting from the People page

You can export raw commit and pull request data to CSV directly from Bitbucket. When you click All users in the People dropdown menu at the header, you’ll get to the People page with a global overview of developers’ activity in terms of commits or pull requests.

At the top-right corner, you’ll notice the Export menu, where you can choose CSV.

export pull request data from bitbucket

By default, the page shows contributions made within a month, but you can choose a longer period up to a quarter. The filtering applies not only to the GUI but also to the data exported, so if you don’t change the timespan, you’ll get a list of commits or pull requests for the last 30 days.

Exporting via the REST API resources

Beginning with version 5.5.0, Awesome Graphs REST API allows you to retrieve and export commit and pull request data to CSV on global, project, repository, and user levels, using the dedicated resources. This functionality is aimed to automate the processes you used to handle manually and streamline the existing workflows.

You can access the in-app documentation (accessible to Awesome Graphs’ users) by choosing Export → REST API on the People page or go to our documentation website.

We’ll show you two examples of the resources and how they work: one for exporting commits and another for pull requests. You’ll be able to use the rest of the resources as they follow the model.

Export commits to CSV

This resource exports a list of commits with their details from all Bitbucket projects and repositories to a CSV file.

Here is the curl request example:

curl -X GET -u username:password "https://bitbucket.your-company-name.com/rest/awesome-graphs-api/latest/commits/export/csv" --output commits.csv

Alternatively, you can use any REST API client like Postman or put the URL directly into your browser’s address bar (you need to be authenticated in Bitbucket in this browser), and you’ll get a generated CSV file.

By default, it exports the data for the last 30 days. You can set a timeframe for exported data up to one year (366 days) with sinceDate / untilDate parameters:

curl -X GET -u username:password "https://bitbucket.your-company-name.com/rest/awesome-graphs-api/latest/commits/export/csv?sinceDate=2020-10-01&untilDate=2020-10-13" --output commits.csv

For commit resources, you can also use the query parameters such as merges to filter merge/non-merge commits or order to specify the order to return commits in.

Read more about the resource and its parameters.

Export pull requests to CSV

The pull request resources work similarly, so to export a list of pull requests with their details from all Bitbucket projects and repositories to a CSV file, make the following curl request:

curl -X GET -u username:password "https://bitbucket.your-company-name.com/rest/awesome-graphs-api/latest/pull-requests/export/csv" --output pullrequests.csv

The sinceDate / untilDate parameters can also be applied to state the timespan up to a year, but here you have an additional parameter dateType, allowing you to choose either the creation date or the date of the last update as a filtering criterion. So, if you set dateType to created, only the pull requests created during the stated period will be returned, while dateType set to updated will include the pull requests that were updated within the time frame.

Another pull request specific parameter is state, which allows you to filter the response to only include openmerged, or declined pull requests.

For example, the following request will return a list of open pull requests, which were updated between October 1st and October 13th:

curl -X GET -u username:password "https://bitbucket.your-company-name.com/rest/awesome-graphs-api/latest/commits/export/csv?dateType=updated&state=open&sinceDate=2020-10-01&untilDate=2020-10-13" --output pullrequests.csv

Learn more about this resource.

Integrate intelligently

While CSV is supported by many systems and is quite comfortable to manage, it is not the only way for software integrations the Awesome Graphs for Bitbucket app offers. Using the REST API, you can make the data flow between the applications and automate the workflow, eliminating manual work. And we want to make it easier for you and save your time.
Let us know what integrations you are interested in, and we’ll try to bring them to you, so you don’t have to spend time and energy creating workarounds.

How to count lines of code in Bitbucket to decide what SonarQube license you need

October 29, 2020
#Bitbucket#Reporting#How To
10 min

SonarQube is a tool used to identify software metrics and technical debt in the source code through static analysis. While the Community Edition is free and open-source, the Developer, Enterprise, and Data Center editions are priced per instance per year and based on the number of lines of code (LOC). If you want to buy a license for SonarQube, you need to count lines of code for Bitbucket projects and repositories you want to analyze. 

Awesome Graphs for Bitbucket offers you different ways of getting this information. In this post, we’ll show how you can count LOC for your Bitbucket instance, projects, or repositories, using the Awesome Graphs’ REST API resources and Python.

How to count lines of code for the whole Bitbucket instance

Getting lines of code statistics for an instance is pretty straightforward and will only require making one call to the REST API. Here is an example of the curl command:

curl -X GET -u username:password "https://bitbucket.your-company-name.com/rest/awesome-graphs-api/latest/commits/statistics"

And the response will look like this:

{
    "linesOfCode":{
        "added":5958278,
        "deleted":2970874
    },
    "commits":57595
}

It returns the number of lines added and deleted. So, to get the total, you’ll simply need to subtract the number of deleted from the added.

Please note that blank lines are also counted in lines of code statistics in this and the following cases.

How to count lines of code for each project in the instance

You can also use the REST API resource to get the LOC for a particular project, but doing this for each project in your instance will definitely take a while. That’s why we are going to automate this process with a simple Python script that will run through all of your projects, count the total LOC for each one, and then will save the list of project keys with their total LOC to a CSV file.

The resulting CSV will look like this:

And here is the script to get it:

import requests
import csv
import sys

bitbucket_url = sys.argv[1]
bb_api_url = bitbucket_url + '/rest/api/latest'
ag_api_url = bitbucket_url + '/rest/awesome-graphs-api/latest'

s = requests.Session()
s.auth = (sys.argv[2], sys.argv[3])

def get_project_keys():

    projects = list()

    is_last_page = False

    while not is_last_page:
        request_url = bb_api_url + '/projects'
        response = s.get(request_url, params={'start': len(projects), 'limit': 25}).json()

        for project in response['values']:
            projects.append(project['key'])
        is_last_page = response['isLastPage']

    return projects

def get_total_loc(project_key):

    url = ag_api_url + '/projects/' + project_key + '/commits/statistics'
    response = s.get(url).json()
    total_loc = response['linesOfCode']['added'] - response['linesOfCode']['deleted']

    return total_loc


with open('total_loc_per_project.csv', mode='a', newline='') as report_file:

    report_writer = csv.writer(report_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    report_writer.writerow(['project_key', 'total_loc'])

    for project_key in get_project_keys():
        print('Processing project', project_key)
        report_writer.writerow([project_key, get_total_loc(project_key)])

To make this script work, you’ll need to install the requests in advance, the csv and sys modules are available in Python out of the box. You need to pass three arguments to the script when executed: the URL of your Bitbucket, login, password. Here’s an example:

py script.py https://bitbucket.your-company-name.com login password

How to count lines of code for each repository in the project

This case is very similar to the previous one, but this script will get the total LOC for each repository in the specified project. Here, the resulting CSV file will include the list of repo slugs in the specified project and their LOC totals:

Counting lines of code for each repository in Bitbucket

The script:

import requests
import csv
import sys

bitbucket_url = sys.argv[1]
bb_api_url = bitbucket_url + '/rest/api/latest'
ag_api_url = bitbucket_url + '/rest/awesome-graphs-api/latest'

s = requests.Session()
s.auth = (sys.argv[2], sys.argv[3])

project_key = sys.argv[4]


def get_repos(project_key):
    
    repos = list()

    is_last_page = False

    while not is_last_page:
        request_url = bb_api_url + '/projects/' + project_key + '/repos'
        response = s.get(request_url, params={'start': len(repos), 'limit': 25}).json()
        for repo in response['values']:
            repos.append(repo['slug'])
        is_last_page =  response['isLastPage']

    return repos


def get_total_loc(repo_slug):

    url = ag_api_url + '/projects/' + project_key + \
          '/repos/' + repo_slug + '/commits/statistics'
    response = s.get(url).json()
    total_loc = response['linesOfCode']['added'] - response['linesOfCode']['deleted']

    return total_loc


with open('total_loc_per_repo.csv', mode='a', newline='') as report_file:
    report_writer = csv.writer(report_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    report_writer.writerow(['repo_slug', 'total_loc'])

    for repo_slug in get_repos(project_key):
        print('Processing repository', repo_slug)
        report_writer.writerow([repo_slug, get_total_loc(repo_slug)])

You need to pass the URL of your Bitbucket, login, password, project key, which will look as follows:

py script.py https://bitbucket.your-company-name.com login password PROJECTKEY

Want to learn more?

We should note that the total LOC we get in each case shows the number of lines added minus lines deleted for all branches. Due to these peculiarities, some repos may have negative LOC numbers, so it might be useful to look at the LOC for a default branch and compare it to the LOC for all branches.

If you would like to learn how to get this information, write here in the comments or create a request in our Help Center, and we’ll cover it in future posts!

You can also check how to search for commits in Bitbucket, read our blog post that suggests three different ways of how you can do this.