About me:
Librarian, Research & Data Services team, SMU Libraries.
Bachelor of IT, MSc in Info Studies.
Have been with SMU since the pandemic era (2021).
About this workshop:
Live-coding format; code along with me!
Goal of workshop: to give you enough fundamentals (at least to the point where ChatGPT can’t bluff you so easily) and confidence to explore python on your own.
Comes with 3 quizzes that you can do at home - one quiz after each session, except for the last one.
Don’t be afraid to ask for help! We are all here to learn
Live coding & lots of hands-on
Take-home quiz after session 1, 2, and 3 to reinforce the learning - Link will be put on the course website.
Only for session 4: a small group activity at the second half of the session.
Choose an economic indicator from Federal Reserve Economic Data (FRED) e.g. Civilian Unemployment Rate.
Apply your Python knowledge to create a “storyboard” with visualizations to derive your economic outlook.
Introduction to Visual Studio
Introduction to Python - objects, values, and types
Handling Lists
Handling Loops and Conditionals
Q: What is Python, actually?
A: Python is a general-purpose programming language used for a wide variety of applications. Website, games, CAD applications, web applications, AI, etc.
Q: Why should I learn Python?
A: It’s one of the most popular programming language and the most dominant language in AI, ML, and data analytics field. It’s a useful skill to know if you plan to go into those fields or anything adjacent.
If it’s a .py file…
A regular python file.
A plain text and contains just python codes
Can be edited and run on Visual Studio Code, PyCharm, SPyder, etc or the terminal/command prompt.
If it’s a .ipynb file…
ipynb stands for Interactive Python Notebook file
Contains the notebook code, the execution results and other internal settings in a specific format
Can be edited and run on Jupyter Notebook/Lab, Google Colab, and also Visual Studio (with the Jupyter Notebook extension)
If it’s a .py file…
If it’s a .ipynb file…
If it’s a .py file…
For “production” uses e.g., creating apps, industrial deployments
Executing the file will run all the codes contained inside the file.
If it’s a .ipynb file…
Used a lot for academic / scientific purposes; great for quick experiments or teaching / presentation.
Codes will be executed on per-block basis (we will see this in action later on)
In your laptop, navigate to where you usually keep your files, and create a folder called 2024-09-python-workshop
. This folder will be our “workspace” where we keep our scripts and data files.
Inside your workspace folder, create the following sub-folders:
data
- we will save our raw data here. It’s best practice to keep the data here untouched.
data-output
- if we need to modify raw data, store the modified version here.
fig-output
- we will save all the graphics we created here!
Start VS Code.
Go to File
> Open Folder
.
Open the 2024-09-python-workshop
folder we just created.
Virtual environment helps us to ensure that any packages we install will be isolated from other environments, including the global interpreter. This helps prevent issues that can occur due to conflicting package versions - especially useful for reproducibility!
Open the Command Palette (Ctrl
+ Shift
+ P
on Windows - Cmd
+ Shift
+ P
on Macbook).
Type Python: Create Environment, and select the command. Choose Venv
.
Open the Command Palette (Ctrl
+ Shift
+ P
on Windows - Cmd
+ Shift
+ P
on Macbook).
Type Python: New Python File, and select the command.
Rename the file to 01-intro.py
In Python, everything is an object. An object is a piece of data that can have attributes and methods associated with it.
The actual data stored in a variable or object. It can be a number, string, list, or any other data type.
A value that has a name associated with it.
To give a value to a variable. This is done using the equals sign (=).
A reusable block of code that performs a specific task. Functions can take inputs (arguments) and return outputs.
To execute or run a function. This is done by using the function name followed by parentheses.
Values passed to a function when it is called. These values are used by the function to perform its task.
Variables defined in a function’s declaration that act as placeholders for arguments. They specify what kind of data the function expects to receive.
Collections of related modules that extend Python’s capabilities. They can be installed and imported to add new functionalities within your workspace.
Hands on session - Make sure to open the py
file we just created.
Type the following in your .py
file…
…and then click on the run button to run the script
You should see “Hello World!” and a 10 printed at the Terminal window at the bottom
()
at the end of their name.print()
that prints things as text.Variables are names for values. # In Python, the =
symbol assigns the value on the right to the variable on the left.
Once you declare a variable, you must assign value to it.
Variable names criteria:
quarter1_growth
instead of just q1g
)Variables must be created before they are used.
If a variable doesn’t exist yet, or if the name has been mis-spelled, Python reports an error. e.g. if you try to simply declare a variable called name
but didn’t assign any value to it, Python will not like that and reports an error!
We can pass variables to functions, e.g. let’s pass the greetings
variable to print()
function. This should print out the value of greetings
.
Hello World!
today's greetings: Hello World!
We can also update the value contained inside a variable. Let’s try adding 10 to variable age
.
You can also assign a variable value to another variable
What is the final value of variable position
below?
What is the final value of variable country_1
below?
Non-Continuous Data
Continuous Data
True
or False
. Examples: True, FalseBecause a value’s data type determines what the program can do to it.
For example, the calculations below work on int
data type
But the above operation won’t work on a string
However you can use “+” and “*” on strings
Some data types need to be converted before we can do some operations on them i.e. this will give you an error.
Here is how you can convert the data types:
Take note of the execution order of your variables! They only change value when something is assigned to them.
public_holiday = 11 # initial value
new_holidays = public_holiday + 5
public_holiday = 10 # we update the variable
print("Public holiday:", public_holiday)
print("New holidays:", new_holidays)
Public holiday: 10
New holidays: 16
Updating the value of public_holiday
will not auto-update the value of new_holidays
.
What is the data type of each of this variable?
round()
which will round a float number to the nearest integer.formal definition, but often used interchangeably:
print()
earlier.help()
function. This example below should tell you more about round()
once executed.Other than the basic 4 types that we learned earlier, Python has more data structures where we can store multiple values in a single variable:
[1, 2, 3]
, ['a', 'b', 'c']
(1, 2, 3)
, ('a', 'b', 'c')
{'name': 'John', 'age': 30}
{1, 2, 3}
We will not be covering all of the above, only list and dictionaries (for session 4)
Scenario: We want to track Singapore’s GDP year-over-year (YoY) growth from 2024 to 2018.
You can also initialize an empty list to fill later
As a good first step, it’s good to check the number of items you have in your list!
Sometimes you don’t need the entire list. To retrieve a single value from a list, simply specify their index number enclosed in square brackets
print('GDP growth rate in 2023 Q1 (1st item):', quarterly_gdp_growth[0])
print('GDP growth rate in 2024 Q2 (6th item):', quarterly_gdp_growth[5])
GDP growth rate in 2023 Q1 (1st item): -2
GDP growth rate in 2024 Q2 (6th item): 1.1
If we put an index number that’s larger than actual number of items, python will give us an error!
Slicing allows you to extract a portion of the list by specifying a range of indices. The basic syntax for slicing is:
start
is the index where the slice starts (inclusive)end
is the index where the slice ends (exclusive)start
is omitted, the slice begins from the start of the listend
is omitted, the slice goes to the end of the liststep
is optional; it determines how many items to skipSpecifying the start and end:
Omitting the end index:
From Q3 2023 onwards: [2.8, 2.6, 7.3, 1.1, 3.4]
Omitting the start:
If we put a negative index, python list will count from the end instead of the beginning!
You can also use a step value to skip items!
It is particularly useful if you need to reverse a list.
List values can be replaced by simply assigning a new value to them! Let’s update the growth rate for 2023 Q3 (index 2) with the correct figure.
Let’s append a new data of 3.2% growth that we get for 2024 Q3 to the list. Appending will put this new item at the end of the list.
print('GDP growth rates initially:', quarterly_gdp_growth)
quarterly_gdp_growth.append(3.2)
print('GDP growth rates after adding 2024 Q3 prediction:', quarterly_gdp_growth)
GDP growth rates initially: [-2, -5.8, -2.8, 2.6, 7.3, 1.1, 3.4]
GDP growth rates after adding 2024 Q3 prediction: [-2, -5.8, -2.8, 2.6, 7.3, 1.1, 3.4, 3.2]
Let’s add data to the start of the list! The data is growth for 2022 Q4, which was 9.4%.
print('GDP growth rates initially:', quarterly_gdp_growth)
1quarterly_gdp_growth.insert(0, 9.4)
print('GDP growth rates after adding 2022 Q4:', quarterly_gdp_growth)
GDP growth rates initially: [-2, -5.8, -2.8, 2.6, 7.3, 1.1, 3.4, 3.2]
GDP growth rates after adding 2022 Q4: [9.4, -2, -5.8, -2.8, 2.6, 7.3, 1.1, 3.4, 3.2]
Let’s remove the two items we added previously!
There are two ways to remove items: using del
or .pop()
method.
print('GDP growth rates initially:', quarterly_gdp_growth)
1del quarterly_gdp_growth[0]
2quarterly_gdp_growth.pop()
print('GDP growth rates after removing projection:', quarterly_gdp_growth)
del
, you need to specify the index of the items you’d like to remove
pop()
will remove the item at the end of the list.
GDP growth rates initially: [9.4, -2, -5.8, -2.8, 2.6, 7.3, 1.1, 3.4, 3.2]
GDP growth rates after removing projection: [-2, -5.8, -2.8, 2.6, 7.3, 1.1, 3.4]
Let’s say we want to calculate the average quarterly gdp growth:
You can include values of different types in a list, though for GDP data we’ll stick to numbers!
Write a code to complete the following tasks:
Scenario: We have Singapore’s Quarterly GDP growth rates from 2023 Q1 to 2024 Q2, and we want to print them in this format: Quarterly GDP growth: 5%
Manually printing out the items
print("Quarterly GDP growth:", quarterly_gdp_growth[0], "%")
print("Quarterly GDP growth:", quarterly_gdp_growth[1], "%")
print("Quarterly GDP growth:", quarterly_gdp_growth[2], "%")
print("Quarterly GDP growth:", quarterly_gdp_growth[3], "%")
print("Quarterly GDP growth:", quarterly_gdp_growth[4], "%")
# and so on...
# can you imagine doing this for 100++ items?
Quarterly GDP growth: -2 %
Quarterly GDP growth: -5.8 %
Quarterly GDP growth: -2.8 %
Quarterly GDP growth: 2.6 %
Quarterly GDP growth: 7.3 %
With for
loop
Quarterly GDP growth: -2 %
Quarterly GDP growth: -5.8 %
Quarterly GDP growth: -2.8 %
Quarterly GDP growth: 2.6 %
Quarterly GDP growth: 7.3 %
Quarterly GDP growth: 1.1 %
Quarterly GDP growth: 3.4 %
Quarterly GDP growth: 3.3 %
Quarterly GDP growth: 3.6 %
As you can see, we can achieve the same thing (or more) with less code to type with for
loop.
Anatomy of loops explained using the example above:
growth
is the loop variable, representing the value that changes with each iteration of the loop. You can think of it as the “current item” being processed. Meanwhile, quarterly_gdp_growth
is the collection that the loop iterates over. The loop statement also ends with a colon :
.
print("Quarterly GDP growth:", growth, "%")
is the body of the loop, specifying the action to take for each item in the collection. The body of the loop must be indented.
range()
Below is a loop that will print out numbers from 1 to 6:
enumerate()
We can use enumerate()
to loop not only the items inside a list, but also the index position of each item in the list!
Let’s say we want to re-format our printout to 1) Quarterly GDP growth: 5%
for index, item in enumerate(quarterly_gdp_growth):
print(f"{index + 1}) Quarterly GDP growth: {item}%")
1) Quarterly GDP growth: -2%
2) Quarterly GDP growth: -5.8%
3) Quarterly GDP growth: -2.8%
4) Quarterly GDP growth: 2.6%
5) Quarterly GDP growth: 7.3%
6) Quarterly GDP growth: 1.1%
7) Quarterly GDP growth: 3.4%
8) Quarterly GDP growth: 3.3%
9) Quarterly GDP growth: 3.6%
Scenario: We have Singapore’s Quarterly GDP growth rates, and we want to categorize them as “Growth”, “Stable”, or “Decline” based on their values.
Without conditionals and for
loops
print(quarterly_gdp_growth[0], "% - Category: Decline")
print(quarterly_gdp_growth[1], "% - Category: Decline")
print(quarterly_gdp_growth[2], "% - Category: Decline")
print(quarterly_gdp_growth[3], "% - Category: Growth")
print(quarterly_gdp_growth[4], "% - Category: Growth")
-2 % - Category: Decline
-5.8 % - Category: Decline
-2.8 % - Category: Decline
2.6 % - Category: Growth
7.3 % - Category: Growth
With conditionals and for
loops
for growth in quarterly_gdp_growth:
if growth > 2:
category = "Growth"
elif -2 <= growth <= 2:
category = "Stable"
else:
category = "Decline"
print(f"{growth}% - Category: {category}")
-2% - Category: Stable
-5.8% - Category: Decline
-2.8% - Category: Decline
2.6% - Category: Growth
7.3% - Category: Growth
1.1% - Category: Stable
3.4% - Category: Growth
3.3% - Category: Growth
3.6% - Category: Growth
Let’s break down the anatomy of conditionals first!
Also called if-else structure, it looks like this at the very basic level:
gdp_growth = 2.5
1if gdp_growth > 0:
2 print("The economy is growing.")
3else:
4 print("The economy is not growing.")
The economy is growing.
Sometimes we have multiple conditions that we want to check! We can use elif
(stands for else-if) to add the conditions. You can have multiple elif
clauses.
In some cases, we can’t split the conditions among multiple elifs. For example, we may want to check gdp_growth
and inflation
rate to determine the economic state.
gdp_growth = 1.5
inflation = 2.0
1if gdp_growth > 0 and inflation < 3:
print("The economy is in a good state.")
2elif gdp_growth > 0 or inflation < 3:
print("The economy is in a mixed state.")
else:
print("The economy needs attention.")
and
requires both conditions to be True
or
requires at least one condition to be True
The economy is in a good state.
Assuming variable growth
has the value of 10, the conditionals below will give us the wrong result.
growth = 10
if growth > 0:
print(f"{growth}% is a moderate growth")
elif growth > 5: # This will never be reached for growth > 1
print(f"{growth}% is a strong growth")
else:
print(f"{growth}% is a negative growth")
10% is a moderate growth
This ordering would give the correct result:
For a more complicated conditionals, you can put a conditional inside another! Be careful with indentation to ensure correct nesting!
gdp_growth = 2.5
unemployment = 4.0
1if gdp_growth > 0:
2 if unemployment < 5:
print("Strong economic performance.")
else:
print("Growing economy, but high unemployment.")
else:
print("Economic challenges ahead.")
Strong economic performance.
As seen from the first example of this section, we can combine for loops with conditionals
for growth in quarterly_gdp_growth:
if growth > 2:
category = "Growth"
elif -2 <= growth <= 2:
category = "Stable"
else:
category = "Decline"
print(f"{growth}% - Category: {category}")
-2% - Category: Stable
-5.8% - Category: Decline
-2.8% - Category: Decline
2.6% - Category: Growth
7.3% - Category: Growth
1.1% - Category: Stable
3.4% - Category: Growth
3.3% - Category: Growth
3.6% - Category: Growth
Other ways for loops and conditionals can be used is for when we want to count occurence. Let’s say we want to keep track how many growths and contractions have happened for the past quarters.
1growth_periods = 0
decline_periods = 0
for rate in quarterly_gdp_growth:
if rate > 0:
2 growth_periods += 1
elif rate < 0:
decline_periods += 1
print("Growth periods:", growth_periods)
print("Decline periods:", decline_periods)
Growth periods: 6
Decline periods: 3
Write a script that identifies the highest and lowest growth quarters (5 mins)
We will explore these in the upcoming sessions!
DateTime
NumPy’s Arrays
Panda’s DataFrame & Time series DataFrame
We have covered: variables, data types, list, loop, and conditional
Comments
Use comments to add a layer of documentation to your code. E.g., explain what a block of code does, etc.
Comments always start with a hashtag #
Feel free to use this method to add your own notes throughout the workshop!