Generating graphs for git statistics

Sometimes, it might be interesting to look at some statistics for a git repository, e.g. the number of commits per day or the number of inserted and deleted lines. Looking at endless rows of numbers, however, is often not very satisfiying, and a graphical output showing these information would be much better.

Here are two small bash scripts I wrote for doing exactly that - producing nice graphs showing some key statistics for a given git repository. These scripts call git log with certain parameters to generate the data to be displayed visually. git's output is then processed by some standard Unix tools like awk to generate a data file, which in turn is fed to gnuplot to generate a PDF plot. For a project I am currently working on, these are the resulting plots generated by these two scripts:

The first script generates a plot showing the number of commits per day. Note that the plot's x axis contains major tics for weeks (beginning on mondays) and minor tics for single days.

#/bin/bash

# get path to git repository
git_path='./.git'
if [[ $# -ge 1 ]]; then
    git_path="$1/.git"
fi

# check if git repository exists
if [ ! -d "$git_path" ]; then
    echo "there is no git repository in $git_path"
    exit 1
fi

# generate the data needed for the plot
git_data=$(git --git-dir="$git_path" log --date=short --all --pretty=format:'%ad' | uniq -c | awk '{print $2 " " $1}')

# get the date of the monday before the first commit
# we will use this to separate weeks in the generated plot
day_first_commit=$(echo "$git_data" | tail -n1 | awk '{print $1}')
monday_before_first_commit=$(date -d "$day_first_commit -$(date -d $day_first_commit +%u) days + 1 day" "+%Y-%m-%d")

# pipe data to a gnuplot script to produce a pdf plot
echo "$git_data" | gnuplot -e "\
    set xdata time;\
    set timefmt '%Y-%m-%d';\
    set boxwidth 86400;\
    set terminal pdf enhanced font 'TexGyreSchola,9';\
    set xtics format '%b %d' rotate by 55 right;\
    set style fill solid border lc rgb '#D47400';\
    set xrange [\"$monday_before_first_commit\":];\
    set xtics \"$monday_before_first_commit\", 604800 scale 3, 1 nomirror;\
    set tics front;\
    set yrange [0:];
    set grid lc rgb '#666666';
    plot '<cat' using (timecolumn(1)+24*60*60/2):2 with boxes title '' lc rgb '#FFB823';"

The second script shows the number of inserted and deleted lines per day, boxes for both numbers being "stacked" such that the total number of changes can be seen. Again, major tics mark the beginning of weeks, whereas minor tics mark single days.

#!/bin/bash

# get path to git repository
git_path='./.git'
if [[ $# -ge 1 ]]; then
    git_path="$1/.git"
fi

# check if git repository exists
if [ ! -d "$git_path" ]; then
    echo "there is no git repository in $git_path"
    exit 1
fi

# generate the data needed for the plot
git_data=$(git --git-dir="$git_path" log --all --date=short --numstat -C --format=format:'%ad')$'\n'

# sum up all inserted and deleted lines for each commit
# after this step git_data contains one row per commit of the following format: "date insertions deletions"
read_date=true
git_data=$(echo "$git_data" | while read line ; do
    # check if the current line belongs to the same commit as the line before (i.e. line is not empty)
    if [ -n "$line" ] ; then

        # check if we have to read the date (in the first line for each commit)
        if [ "$read_date" = true ] ; then
            # read date
            date=$line
            read_date=false
        else
            # sum up number of inserted and deleted lines
            insertions=$((insertions+$(echo $line | cut -d' ' -f1)))
            deletions=$((deletions+$(echo $line | cut -d' ' -f2)))
        fi

    else
        # log for commit done -> output and reset variables
        echo "$date $insertions $deletions"
        insertions=0
        deletions=0
        read_date=true
    fi
done)

# get the date of the monday before the first commit
# we will use this to separate weeks in the generated plot
day_first_commit=$(echo -e "$git_data" | tail -n1 | awk '{print $1}')
monday_before_first_commit=$(date -d "$day_first_commit -$(date -d $day_first_commit +%u) days + 1 day" "+%Y-%m-%d")

# sum up all inserted and deleted lines for each _day_
git_data=$(echo "$git_data" | awk '{day_insert[$1]+=$2; day_delete[$1]+=$3} END { for (d in day_insert) print d " " day_insert[d] " " day_delete[d] }')

# write data to a temporary file such that it can be read by gnuplot multiple times
temporary_file=".DATA_FILE.dat"
echo "$git_data" > $temporary_file

echo "$git_data" | gnuplot -e "\
    set xdata time;\
    set timefmt '%Y-%m-%d';\
    set boxwidth 86400;\
    set terminal pdf enhanced font 'TexGyreSchola,9';\
    set xtics format '%b %d' rotate by 55 right;\
    set style fill solid border lc rgb '#282828';\
    set xrange [\"$monday_before_first_commit\":];\
    set xtics \"$monday_before_first_commit\", 604800 scale 3, 1 nomirror;\
    set tics front;\
    set yrange [0:];
    set grid lc rgb '#666666';
    plot '$temporary_file' using (timecolumn(1)+24*60*60/2):(\$2+\$3) with boxes title '' lc rgb '#FD2106', '$temporary_file' using (timecolumn(1)+24*60*60/2):2 with boxes title '' lc rgb '#79BA00';"

# remove the temporary file
rm $temporary_file

When running these scripts, a path to a git repository can be specified. If no argument is given, the current working directory is analyzed. The script writes a PDF file to stdout, so you might want to redirect the output to a file or pipe it into a PDF viewer:

$ ./git_commits.sh path_to_repo > stats.pdf
$ ./git_commits.sh path_to_repo | zathura -