Statistics is an important branch in Mathematics. Statistics gives you information about data that you have collected and find information embedded in them. Numpy is a python package that is optimized to deal with array. It supports number of statistical formula for you to calculate such as mean, median, standard deviation, variance, correlation etc.

If you need to do statistical analysis on your data then you can use Numpy for that. But before you do that, you need data and you need to know how to import data and export data from Numpy. We have shown you how to do this in the previous two blog post.
simple statistical analysis with csv data with Numpy

 Consider that you have some data in CSV file called mydata.csv and that you want to perform various statistical analysis. First we have to import the data as follows:

[In] sale, cost = loadtxt(r'D:\mydata.csv', skiprows = 1, delimiter = ",", usecols = (1,2), unpack = True)

We have saved the columns in sale and cost numpy array ndarray objects. We can view them by simply:

[In] sale
array([  5.4,   6.5,  23.2,  64.2,   0.2,   7.3,  84.3,   5.2,   9.5,
         8.4,  56.2,  65.3,   6.2,   5.3,  67.3,   9.7,   7.5,   2.3,
         5. ,  10.3])

[In] cost
array([ 9050.,  3400.,  2300.,  6030.,  5030.,  9030.,  2040.,  1020.,
        7030.,  5023.,  1003.,  4060.,  3090.,  6540.,  8234.,  2349.,
        9843.,  4394.,  8924.,  8524.])

Calculate the number of items 

We can use the len( ) function to calculate the number of items in each colcumns.

[In] len(cost)
[Out] 20

[In] len(sale)
[Out] 20


mean( ) function can be used to calculate the mean of sale and cost:

[In] sale_mean = mean(sale)

[In] sale_mean
[Out] 22.465000000000003

[In] cost_mean = mean(cost)

[In] cost_mean
[Out] 5345.6999999999998

There are also other mean which we can calculate such as volume weighted average price(vwap) and time weighted average price(twap).

Here we show the volume weighted average price(vwap) calculation. The volume weighted average price is calculated using the average( ) function.

[In] vwap = average(cost, weights= sale)

[In] vwap
[Out] 4525.3087024259949

Maximum and Minimum Value

The maximum and minimum values can be found using the max( ) and min( ) functions.

Maximum and minimum value of costs:

[In] cost_max = max(cost)

[In] cost_max
[Out] 9843.0

[In] cost_min = min(cost)

[In] cost_min
[Out] 1003.0


The range can be calculated using the ptp( ) function. It is the difference between maximum and minimum value. ptp stands for peak to peak. For example the range or peak to peak value for the costs column is:

[In] ptp = ptp(cost)

[In] ptp
[Out] 8840.0


The function median( ) calculates the median in Numpy.

[In] cost_median = median(cost)

[In] cost_median
[Out] 5026.5

 Standard Deviation

The function std( ) calculates the standard deviation in Numpy.

 In] cost_std = std(cost)

[In] cost_std
[Out] 2845.1059927531696


 The function var( ) calculates the variance in Numpy.

[In] cost_var = var(cost)

[In] cost_var
[Out] 8094628.1099999994


 Covariance among two dataset can be calculated in Numpy using the cov( ) function. This is illustrated with the sales and cost column values.

[In] covar = cov(sale, cost)

[In] covar
array([[  7.52131868e+02,  -1.94000953e+04],
       [ -1.94000953e+04,   8.52066117e+06]])


Post a Comment