Python programming language supports number of statistical, mathematical, scientific packages and modules which you can use to do statistical, mathematical, scientific analysis. One of the well know module or python package is called scipy. It supports number of scientific tools such as FFT, linear algebra, signal processing, statistics to name few.

Here it is shown how you can use the python programming language and packages such as scipy and matplotlib to do statistical analysis of you blog web statistics.

If you have google analytics account for your blog or website you can download key statistical tables in CSV format. Then you use python scipy and matplotlib to plot the data in the downloaded csv file. The plot depends what data you want to plot. For example, in this example, we will plot the CTR(Click Through Rate) vs the blog post number. The blog post number is simple the numbers 1, 2, 3, 4 .... and so forth where 1 is the label for some blog post name and 2 is for another blog post name and so on.

Here is how the CSV file looks like.


When you download google analytics CSV file, the CTR will be in percentage. As you can see above, here the CTR were converted to normal floating point. This was done because of the percentage symbol that causes trouble when importing into python workspace using scipy genfromtxt() function. If you wish to convert to percentage value you can multiply the column by 100. However since, we can not that hard to see the correspondence of percentage and floating point representation we will work directly with the file as shown above.

Once you have the data in correct format we can then write the program to import the data and plot it. The following python program code imports and plots the data.

import scipy as sp
import matplotlib.pyplot as plt

data = sp.genfromtxt('CTR.csv', delimiter=",")

x = data[:, 0]
y = data[:, 1]

plt.scatter(x, y)

plt.title("CTR of web pages")
plt.xlabel("Blog posts No. Label")
plt.ylabel("CTR")
plt.autoscale(tight=True)
plt.grid()
plt.show()

The first two code lines are to import the scipy and matplotlib.pyplot packages into python program. The scipy is for the importing the data using the genfromtxt function and the matplotlib.pyplot is for basic plotting.

Then we create a data object using the sp.genfromtxt() command. The first parameter to the genfromtxt() is the filename of CTR.csv and the delimiter is a comma. We then store the first column 0 as x values and second column 1 as the y values. The x values represents numbers from 1 to 800 corresponding the names of the blog post names. The y values represents the CTR in floating point format.

Then we create a simple plot using the plt.scatter(x,y) code line. After that we give a title to the figure, label the x and y axis, autoscale the plot, show the grid and show the figure. The following is the figure what we will get.



The simple scatter plot shows the most the blog posts have CTR less than 0.4 or 40%. Also from the figure we can say that there exists blog posts numbered between 420 and 800 which have CTR of 1 or 100%.

In this way we can use python scipy along with matplotlib for plotting to do statistical analysis of web or blog posts.

0 comments:

Post a Comment

 
Top