We can more easily identify outliers by using the NumPy sort function np.sort(heights_array). We can then begin to identify where possible errors or anomalies lie. You can get people who are between 120cm and 190cm, but it is unlikely that the smallest measurement of 10cm, or the tallest measurement of 1200cm are accurate.
NumPy: Outliers and Sorting
Sometimes, from the range of a given dataset, we will see elements that are unusually larger or smaller than the other elements. In an array of heights, for example, we may see numbers that are very short, or very tall. These elements are known as outliers.
We can more easily identify outliers by using the NumPy sort function np.sort(heights_array). We can then begin to identify where possible errors or anomalies lie. You can get people who are between 120cm and 190cm, but it is unlikely that the smallest measurement of 10cm, or the tallest measurement of 1200cm are accurate.
We can more easily identify outliers by using the NumPy sort function np.sort(heights_array). We can then begin to identify where possible errors or anomalies lie. You can get people who are between 120cm and 190cm, but it is unlikely that the smallest measurement of 10cm, or the tallest measurement of 1200cm are accurate.
Python Pyperclip
Problems installing pyperclip module for python. Not recognised as a module.
Not able to install it from command line. Resolved by running pip install pyperclip by running command line as administrator. (Right-click on command line icon in start menu and select "run as administrator").
Not able to install it from command line. Resolved by running pip install pyperclip by running command line as administrator. (Right-click on command line icon in start menu and select "run as administrator").
Remember this about joins in SQL...
You may be confused by the differences between joins in SQL.
A standard JOIN or inner-join in SQL will join tables where the rows are exactly matching on the column that you are joining on. It will automatically not include non-matching rows, so that you are only presented with rows that are consistent. The benefit of this is that you are only presented with consistent and accurate information. The disadvantage is that you are missing information from some items.
However, a LEFT JOIN will join tables on a column where rows may not be matching (inconsistency between tables can be caused by one table being updated, but corresponding information in another table not being updated). In this scenario, you are given all information, but some attributes may be listed as NULL.
A CROSS JOIN creates a Cartesian Product. This means that it will allow us to combine all rows of one table with all rows of another table. If there are three rows in table A, and three rows in table B, all three rows of table A will be joined with all three rows of table B. The results of this join will have 9 rows.
A standard JOIN or inner-join in SQL will join tables where the rows are exactly matching on the column that you are joining on. It will automatically not include non-matching rows, so that you are only presented with rows that are consistent. The benefit of this is that you are only presented with consistent and accurate information. The disadvantage is that you are missing information from some items.
However, a LEFT JOIN will join tables on a column where rows may not be matching (inconsistency between tables can be caused by one table being updated, but corresponding information in another table not being updated). In this scenario, you are given all information, but some attributes may be listed as NULL.
![]() |
Image above from codecademy.com https://s3.amazonaws.com/codecademy-content/courses/learn-sql/multiple-tables/left-join.gif |
A CROSS JOIN creates a Cartesian Product. This means that it will allow us to combine all rows of one table with all rows of another table. If there are three rows in table A, and three rows in table B, all three rows of table A will be joined with all three rows of table B. The results of this join will have 9 rows.
Shebang Line
The Shebang Line should be the first line of your python program.
This is what Automate the Boring Stuff with Python author Al Sweigart says about the Shebang line.
The first line of all your Python programs should be a shebang line, which tells your computer that you want Python to execute this program. The shebang line begins with #!, but the rest depends on your operating system.You will be able to run Python scripts from IDLE without the shebang line, but the line is needed to run them from the command line.
- On Windows, the shebang line is #! python3.
- On OS X, the shebang line is #! /usr/bin/env python3.
- On Linux, the shebang line is #! /usr/bin/python3.
SQLite strftime() Function
Did you know that strftime() is an SQLite function than allows the programmer to return a formatted date.
It takes two arguments:
strftime(format, column)
- To get an hour: strftime('%H', column_name)
- To get the year: strftime('%Y', column_name)
- To get the month: strftime('%m', column_name)
- To get the day: strftime('%d', column_name)
- To get the minute: strftime('%m', column_name)
- To get the second: strftime('%S', column_name)
The above is true as long as the time format is YYYY-MM-DD HH:MM:SS
More on this function can be read from the SQL documentation here.
File management using openpyxl
When using openpyxl to load workbooks, ensure that your excel file is saved in the same folder as the .py file you are developing.
import openpyxl, pprintprint('Opening workbook... ')wb = openpyxl.load_workbook('censuspopdata.xlsx')sheet = wb.get_sheet_by_name('Population by Census Tract')Otherwise, the following "file not found" error will result:
"FileNotFoundError: [Errno 2] No such file or directory: 'censuspopdata.xlsx'"
TypeError: 'generator' object is not subscriptable
Error in chapter 12, which produces the following "TypeError: 'generator' object is not subscriptable" for the code below.
>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> sheet = wb.active
>>> sheet = wb.active
>>> sheet.columns[1]
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
sheet.columns[1]
TypeError: 'generator' object is not subscriptable
>>>
RESOLUTION:
Create a list for the sheet.colums:
list(sheet.columns)[1] to overcome the generator error - outdated method since python 2 apparently.
>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> sheet = wb.active
>>> sheet = wb.active
>>> sheet.columns[1]
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
sheet.columns[1]
TypeError: 'generator' object is not subscriptable
>>>
RESOLUTION:
Create a list for the sheet.colums:
list(sheet.columns)[1] to overcome the generator error - outdated method since python 2 apparently.
More openpyxl learnings
Continuing our journey through Automate the Boring Stuff with Python's excel lessons, we came across the get_column_letter and column_index_from_string openpyxl functions. When attemtping to import the functions, we were given the following error:
>>> import openpyxl
fr
>>> from openpyxl.cell import get_column_letter, column_index_from_string
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
from openpyxl.cell import get_column_letter, column_index_from_string
ImportError: cannot import name 'get_column_letter' from 'openpyxl.cell' (C:\Users\james\AppData\Roaming\Python\Python37\site-packages\openpyxl\cell\__init__.py)
>>> get_column_letter(1)
Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
get_column_letter(1)
NameError: name 'get_column_letter' is not defined
>>>
=============================== RESTART: Shell ===============================
>>> import openpyxl
f
>>>
=============================== RESTART: Shell ===============================
>>> import openpyxl
>>> from openpyxl.cell import get_column_letter
Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
from openpyxl.cell import get_column_letter
ImportError: cannot import name 'get_column_letter' from 'openpyxl.cell' (C:\Users\james\AppData\Roaming\Python\Python37\site-packages\openpyxl\cell\__init__.py)
Well, apparently the Automate the Boring Stuff with Python is a little out of date now! According to a StackOverflow contributor,
fr
>>> from openpyxl.cell import get_column_letter, column_index_from_string
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
from openpyxl.cell import get_column_letter, column_index_from_string
ImportError: cannot import name 'get_column_letter' from 'openpyxl.cell' (C:\Users\james\AppData\Roaming\Python\Python37\site-packages\openpyxl\cell\__init__.py)
>>> get_column_letter(1)
Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
get_column_letter(1)
NameError: name 'get_column_letter' is not defined
>>>
=============================== RESTART: Shell ===============================
>>> import openpyxl
f
>>>
=============================== RESTART: Shell ===============================
>>> import openpyxl
>>> from openpyxl.cell import get_column_letter
Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
from openpyxl.cell import get_column_letter
ImportError: cannot import name 'get_column_letter' from 'openpyxl.cell' (C:\Users\james\AppData\Roaming\Python\Python37\site-packages\openpyxl\cell\__init__.py)
Well, apparently the Automate the Boring Stuff with Python is a little out of date now! According to a StackOverflow contributor,
The functionThe .utils instead of .cell directory worked.get_column_letter
has been relocated in Openpyxl version 2.4 fromopenpyxl.cell
toopenpyxl.utils
.
The current import is:from openpyxl.utils import get_column_letter
Trouble Using Pip and Installing Openpyxl
Openpyxl is, according to Automate the Boring Stuff with Python, supposed to be a handy python library which allows programmers to use python programs with Microsoft excel sheets.
But first you need to install pip on your machine.
What is pip?
The first difficulty I had was getting pip to work through my command line. I resolved that issue using paths - as described in this video (https://www.youtube.com/watch?v=Jw_MuM2BOuI).
The next issue I had was with installing openpyxl itself (which you need pip to do). I received the following error:
Eventually, I found a solution on stackoverflow, which suggested using the following:
But first you need to install pip on your machine.
What is pip?
"Pip is a package manager for python packages. A package contains all the files you need for a module. Modules are Python code libraries that you can include in your project". - W3 schools.
The first difficulty I had was getting pip to work through my command line. I resolved that issue using paths - as described in this video (https://www.youtube.com/watch?v=Jw_MuM2BOuI).
The next issue I had was with installing openpyxl itself (which you need pip to do). I received the following error:
"Could not install packages due to an EnvironmentError: [Errno 13] Permission denied:"
"Consider using the `--user` option or check the permissions"
Eventually, I found a solution on stackoverflow, which suggested using the following:
This worked.python -m pip install --user openpyxl
Subscribe to:
Posts (Atom)
Web Development: Organizing Files and Folders
When you begin to build your website, it's a very clever idea to organize your files and folders efficiently. You should have: A ...
-
Continuing our journey through Automate the Boring Stuff with Python's excel lessons, we came across the get_column_letter and column_in...
-
What are the meanings of the various types of averages in datasets? Mean == the "centre" ("center") of a dataset. ...
-
First, a warning. Don't get mixed-up between finding the percentage of x in a list and finding a percentile of x. I've already co...