Introductory R for Social Sciences

Academic Year 24/25 Term 1

Welcome!

This site serves as a repository for the slides and codes developed for the ‘Introductory R for Social Sciences’ workshop tailored for undergraduate students at SMU. Comprising five sessions, this resource is designed for individuals with fundamental statistics knowledge who are venturing into R programming or coding for the first time.

Instructor

Instructor: Bella Ratmelia, Senior Librarian, Research & Data Services, SMU Libraries

Teaching Assistant: Hector Tan Kheng Wee, SCIS MITB Postgraduate Student

Schedule

Session Date and Time Venue Topic
1 Fri, 23 Aug 2024, 3.30 - 5.30 PM YPHSL seminar room 2.1 Introduction to R and RStudio
2 Fri, 30 Aug 2024, 3.30 - 5.30 PM YPHSL seminar room 2.1 Data wrangling with tidyverse
3 Fri, 6 Sep 2024, 3.30 - 5.30 PM YPHSL seminar room 2.1 Data visualization and Descriptive Statistics
4 Fri, 13 Sep 2024, 3.30 - 5.30 PM YPHSL seminar room 2.2 Basic Inferential Tests
5 Fri, 20 Sep 2024, 3.30 - 5.30 PM YPHSL seminar room 2.1 Regression and Introduction to Quarto
~ At your own pace At your own choosing Post-workshop exercises

Pre-workshop activities

R and RStudio

R and RStudio are two different but complementary tools used in data analysis and statistical computing. Understanding the difference between them is crucial (especially for beginners).

R is a programming language and software environment specifically designed for statistical computing and graphics. It’s the core tool that actually processes your data and performs analyses.

RStudio is an Integrated Development Environment (IDE) for R. Think of it as a user-friendly interface that makes it easier to write R code, manage projects, and visualize output. RStudio is not required to use R, but it significantly enhances the R programming experience!

Important

Always install R before installing RStudio, as RStudio requires R to function.

Step 1: Install R

  1. Go to https://cran.rstudio.com/

  2. Choose your operating system (Windows, Mac, or Linux)

  3. Download and install the latest version of R

Step 2: Install RStudio

  1. Visit https://posit.co/download/rstudio-desktop/

  2. Scroll down to “Download RStudio Desktop”

  3. Click “Download” and install the latest version of RStudio

Step 3: Install the required packages

Note

Packages in R are collections of additional tools, functions, and datasets that extend R’s capabilities. Packages are created by contributors in the R community.

Think of R as a smartphone, and packages as apps you download to add new features. Each package is designed to help with specific tasks or analyses, like creating graphs, analyzing particular types of data, or performing advanced statistical tests.

Packages can save us time and leverage expert-created tools without having to write complex code from scratch. It’s very handy!

  1. Open RStudio

  2. Copy the following code: install.packages(c( "car", "rmarkdown", "huxtable", "gt", "gtsummary", "tidyverse"))

  3. Paste it in the Console tab (see image below)

  4. Press Enter. Rstudio should proceed with installing the packages that we need.

Dataset

chile_voting

For this workshop, we will be using dataset stored in a CSV file called chile_voting.csv.

Click here to open chile_voting.csv, and then Ctrl + S / Cmd + S to save it locally into your data folder
What is a CSV?

A CSV (Comma-Separated Values) file is a type of file that stores data in a plain text format. Each line in the file represents a row of data, and within each row, individual pieces of data (like numbers or words) are separated by commas.

This format is commonly used for storing and transferring data, especially in spreadsheets and databases. Because it is literally just plain text, it is an ideal format if you have large amount of data.

You can open CSV files in Excel, Google Sheets, or even Notepad!

Data dictionary for chile_voting

The data is from a national survey conducted in April and May of 1988 by FLACSO/Chile, capturing voting intentions for the 1988 Chilean plebiscite. The dataset contains information about respondents’ demographic characteristics and their voting intentions. This data can also be found from carData package, but for the purpose of this workshop, we will load the data from a CSV.

Key variables in the dataset:


Explanatory notes on each column
Variable Description
region Region of voters: C (Central), M (Metropolitan Santiago area), N (North), S (South), SA (city of Santiago)
population Population size of respondent’s community
sex Sex of voters: F (female), M (male)
age Age in years
education Education level of voters: P (Primary), PS (Post-secondary), S (Secondary)
income Monthly income, in Pesos
statusquo Scale of support for the status-quo in numerical value
vote Voter’s decision: A (will abstain), N (will vote no), U (undecided), Y (will vote yes)

chile_voting_processed

For session 3 onwards, we will use chile_voting_processed.csv. This is the version that has been processed and cleaned. You can download this file below if you missed session 1 and 2. Save this inside the data-output folder of your project.

Click here to open chile_voting_processed.CSV, and then Ctrl + S / Cmd + S to save it locally into your data-output folder