Percentiles with Numpy

First, a warning.  Don't get mixed-up between finding the percentage of x in a list and finding a percentile of x.  I've already covered the code of how to get percentages
find_percentages = np.mean(array_name < 81) 
which will return the percent of elements that are less than 81 in an array. 

Numpy has a function to find percentiles from arrays. 

It takes two arguments: the variable name for the array you are exploring, and the percentile you would like to retrieve from it.  The code looks like this: find_percentile = np.percentile(array_name, 40) - the fortieth percentile of this array will be the output. 

Quartiles, Inter-quartile and Median. 

Codecademy has given this elaboration of percentiles: 

Some percentiles have specific names:
  • The 25th percentile is called the first quartile
  • The 50th percentile is called the median
  • The 75th percentile is called the third quartile
The minimum, first quartile, median, third quartile, and maximum of a dataset are called a five-number summary. This set of numbers is a great thing to compute when we get a new dataset.
The difference between the first and third quartile is a value called the interquartile range. For example, say we have the following array:
d = [1, 2, 3, 4, 4, 4, 6, 6, 7, 8, 8]
We can calculate the 25th and 75th percentiles using np.percentile:
np.percentile(d, 25) >>> 3.5 np.percentile(d, 75) >>> 6.5
Then to find the interquartile range, we subtract the value of the 25th percentile from the value of the 75th:
6.5 - 3.5 = 3
50% of the dataset will lie within the interquartile range. The interquartile range gives us an idea of how spread out our data is. The smaller the interquartile range value, the less variance in our dataset. The greater the value, the larger the variance.

No comments:

Post a Comment

Web Development: Organizing Files and Folders

When you begin to build your website, it's a very clever idea to organize  your files and folders efficiently. You should have: A ...