Introductory R for Social Sciences
Academic Year 24/25 Term 1
Welcome!
This site serves as a repository for the slides and codes developed for the ‘Introductory R for Social Sciences’ workshop tailored for undergraduate students at SMU. Comprising five sessions, this resource is designed for individuals with fundamental statistics knowledge who are venturing into R programming or coding for the first time.
Instructor
Instructor: Bella Ratmelia, Senior Librarian, Research & Data Services, SMU Libraries
Teaching Assistant: Hector Tan Kheng Wee, SCIS MITB Postgraduate Student
Schedule
Session | Date and Time | Venue | Topic |
---|---|---|---|
1 | Fri, 23 Aug 2024, 3.30 - 5.30 PM | YPHSL seminar room 2.1 | Introduction to R and RStudio |
2 | Fri, 30 Aug 2024, 3.30 - 5.30 PM | YPHSL seminar room 2.1 | Data wrangling with tidyverse |
3 | Fri, 6 Sep 2024, 3.30 - 5.30 PM | YPHSL seminar room 2.1 | Data visualization and Descriptive Statistics |
4 | Fri, 13 Sep 2024, 3.30 - 5.30 PM | YPHSL seminar room 2.2 | Basic Inferential Tests |
5 | Fri, 20 Sep 2024, 3.30 - 5.30 PM | YPHSL seminar room 2.1 | Regression and Introduction to Quarto |
~ | At your own pace | At your own choosing | Post-workshop exercises |
Pre-workshop activities
R and RStudio
R and RStudio are two different but complementary tools used in data analysis and statistical computing. Understanding the difference between them is crucial (especially for beginners).
R is a programming language and software environment specifically designed for statistical computing and graphics. It’s the core tool that actually processes your data and performs analyses.
RStudio is an Integrated Development Environment (IDE) for R. Think of it as a user-friendly interface that makes it easier to write R code, manage projects, and visualize output. RStudio is not required to use R, but it significantly enhances the R programming experience!
Always install R before installing RStudio, as RStudio requires R to function.
Step 1: Install R
Choose your operating system (Windows, Mac, or Linux)
Download and install the latest version of R
Step 2: Install RStudio
Scroll down to “Download RStudio Desktop”
Click “Download” and install the latest version of RStudio
Step 3: Install the required packages
Packages in R are collections of additional tools, functions, and datasets that extend R’s capabilities. Packages are created by contributors in the R community.
Think of R as a smartphone, and packages as apps you download to add new features. Each package is designed to help with specific tasks or analyses, like creating graphs, analyzing particular types of data, or performing advanced statistical tests.
Packages can save us time and leverage expert-created tools without having to write complex code from scratch. It’s very handy!
Open RStudio
Copy the following code:
install.packages(c( "car", "rmarkdown", "huxtable", "gt", "gtsummary", "tidyverse"))
Paste it in the Console tab (see image below)
Press Enter. Rstudio should proceed with installing the packages that we need.
Dataset
chile_voting
For this workshop, we will be using dataset stored in a CSV file called chile_voting.csv
.
A CSV (Comma-Separated Values) file is a type of file that stores data in a plain text format. Each line in the file represents a row of data, and within each row, individual pieces of data (like numbers or words) are separated by commas.
This format is commonly used for storing and transferring data, especially in spreadsheets and databases. Because it is literally just plain text, it is an ideal format if you have large amount of data.
You can open CSV files in Excel, Google Sheets, or even Notepad!
Data dictionary for chile_voting
The data is from a national survey conducted in April and May of 1988 by FLACSO/Chile, capturing voting intentions for the 1988 Chilean plebiscite. The dataset contains information about respondents’ demographic characteristics and their voting intentions. This data can also be found from carData
package, but for the purpose of this workshop, we will load the data from a CSV.
Key variables in the dataset:
Variable | Description |
---|---|
region |
Region of voters: C (Central), M (Metropolitan Santiago area), N (North), S (South), SA (city of Santiago) |
population |
Population size of respondent’s community |
sex |
Sex of voters: F (female), M (male) |
age |
Age in years |
education |
Education level of voters: P (Primary), PS (Post-secondary), S (Secondary) |
income |
Monthly income, in Pesos |
statusquo |
Scale of support for the status-quo in numerical value |
vote |
Voter’s decision: A (will abstain), N (will vote no), U (undecided), Y (will vote yes) |
chile_voting_processed
For session 3 onwards, we will use chile_voting_processed.csv
. This is the version that has been processed and cleaned. You can download this file below if you missed session 1 and 2. Save this inside the data-output
folder of your project.