Jamesy Mc Jamesface: programming

Showing posts with label programming. Show all posts

For Loops in JavaScript

Below is an example of a for loop from JavaScript. Have a look and see if you understand how it works:

var fruits = ["peach", "orange", "Apple"];

for (var i = 0; i<fruits.length; i++)
{
//we are now inside our loop.
document.write("value of i is: " + i);

//this is the html for a new line.
document.write("<br>");

document.write("the fruit at i is: " + fruits[i]);

document.write("<br>");
}

Percentiles with Numpy

First, a warning. Don't get mixed-up between finding the percentage of x in a list and finding a percentile of x. I've already covered the code of how to get percentages -

find_percentages = np.mean(array_name < 81)

which will return the percent of elements that are less than 81 in an array.

Numpy has a function to find percentiles from arrays.

It takes two arguments: the variable name for the array you are exploring, and the percentile you would like to retrieve from it. The code looks like this: find_percentile = np.percentile(array_name, 40) - the fortieth percentile of this array will be the output.

Quartiles, Inter-quartile and Median.

Codecademy has given this elaboration of percentiles:

Some percentiles have specific names:

The 25th percentile is called the first quartile
The 50th percentile is called the median
The 75th percentile is called the third quartile

The minimum, first quartile, median, third quartile, and maximum of a dataset are called a five-number summary. This set of numbers is a great thing to compute when we get a new dataset.

The difference between the first and third quartile is a value called the interquartile range. For example, say we have the following array:


d = [1, 2, 3, 4, 4, 4, 6, 6, 7, 8, 8]

We can calculate the 25th and 75th percentiles using np.percentile:


np.percentile(d, 25)
>>> 3.5
np.percentile(d, 75)
>>> 6.5

Then to find the interquartile range, we subtract the value of the 25th percentile from the value of the 75th:


6.5 - 3.5 = 3

50% of the dataset will lie within the interquartile range. The interquartile range gives us an idea of how spread out our data is. The smaller the interquartile range value, the less variance in our dataset. The greater the value, the larger the variance.

NumPy: Averages of Data Sets

What are the meanings of the various types of averages in datasets?

Mean == the "centre" ("center") of a dataset.

If you have an array: array_1 = np.array([1,2,3,4,5]), the mean of this array would be 3, because 1 + 2 +3 + 4 + 5 = 15, 15 / 5(numbers in array) = 3. Thus, 3 would be the average or the mean of this particular list.
The mean is affected by outliers.

Median == the "middle" of a dataset.

If you have a list [1, 1, 4, 7, 8, 9, 9], then 7 would be the median of the list, as it is literally halfway between the minimum value and the maximum value.
If you have a list whose length is an even number, say [1, 2, 2, 3, 4, 5, 5, 7] (8 numbers), then the median is the half-way point between the two middle numbers (in this case 3 & 4), so the median of the list above would be 3.5.
Of course, we're likely to be dealing with very large lists and arrays, so working out the middle numbers ourselves would become a very tedious task. We can overcome this by using the np.median function.
The median is not affected by outliers.

Finding Percentages:

You can use numpy in conjunction with the mean function to work out percentages from a given dataset: np.mean. You can do so using logical operators. For example, if you have an np.array_example = [15, 18, 9, 5, 4, 21, 10, 16] and you wanted to find out the percentage of elements greater than 10, you could do so by using:

>>>np.mean(array_example > 10)
0.5

The above result is 0.5 or 50%.

Why does this work? Well, the code is using a logical operator to iterate through the array data. Where an element is greater than 10 it is equal to 1 (or True). Where it is equal to, or not greater than 10, it is equal to 0 (or False). The mean function then takes the number of results equal to 1 and divides them by the number of elements in the list (in this case the answer would be 4 / 8 = 0.5). In other words, 50% of elements in the array are equal to True, which in this case is the same as saying 50% of elements in the array are greater than 10.

NumPy: Outliers and Sorting

Sometimes, from the range of a given dataset, we will see elements that are unusually larger or smaller than the other elements. In an array of heights, for example, we may see numbers that are very short, or very tall. These elements are known as outliers.

We can more easily identify outliers by using the NumPy sort function np.sort(heights_array). We can then begin to identify where possible errors or anomalies lie. You can get people who are between 120cm and 190cm, but it is unlikely that the smallest measurement of 10cm, or the tallest measurement of 1200cm are accurate.

Shebang Line

The Shebang Line should be the first line of your python program.

This is what Automate the Boring Stuff with Python author Al Sweigart says about the Shebang line.

The first line of all your Python programs should be a shebang line, which tells your computer that you want Python to execute this program. The shebang line begins with #!, but the rest depends on your operating system.

On Windows, the shebang line is #! python3.

On OS X, the shebang line is #! /usr/bin/env python3.

On Linux, the shebang line is #! /usr/bin/python3.

You will be able to run Python scripts from IDLE without the shebang line, but the line is needed to run them from the command line.

SQLite strftime() Function

Did you know that strftime() is an SQLite function than allows the programmer to return a formatted date.

It takes two arguments:

strftime(format, column)

To get an hour: strftime('%H', column_name)
To get the year: strftime('%Y', column_name)
To get the month: strftime('%m', column_name)
To get the day: strftime('%d', column_name)
To get the minute: strftime('%m', column_name)
To get the second: strftime('%S', column_name)

The above is true as long as the time format is YYYY-MM-DD HH:MM:SS

More on this function can be read from the SQL documentation here.

TypeError: 'generator' object is not subscriptable

Error in chapter 12, which produces the following "TypeError: 'generator' object is not subscriptable" for the code below.

>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> sheet = wb.active
>>> sheet = wb.active
>>> sheet.columns[1]
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
sheet.columns[1]
TypeError: 'generator' object is not subscriptable
>>>

RESOLUTION:
Create a list for the sheet.colums:
list(sheet.columns)[1] to overcome the generator error - outdated method since python 2 apparently.

Trouble Using Pip and Installing Openpyxl

Openpyxl is, according to Automate the Boring Stuff with Python, supposed to be a handy python library which allows programmers to use python programs with Microsoft excel sheets.

But first you need to install pip on your machine.

What is pip?

"Pip is a package manager for python packages. A package contains all the files you need for a module. Modules are Python code libraries that you can include in your project". - W3 schools.

The first difficulty I had was getting pip to work through my command line. I resolved that issue using paths - as described in this video (https://www.youtube.com/watch?v=Jw_MuM2BOuI).

The next issue I had was with installing openpyxl itself (which you need pip to do). I received the following error:

"Could not install packages due to an EnvironmentError: [Errno 13] Permission denied:"

"Consider using the `--user` option or check the permissions"

Eventually, I found a solution on stackoverflow, which suggested using the following:

python -m pip install --user openpyxl

This worked.

Jamesy Mc Jamesface