Project Name Recycling Rates By Waste Type

Problem Statement The state of recycling in Singapore. Has it increased or decreased?

Project Goals

  • Learn how to import a dataset with Pandas
  • Learn how to clean and manipulate data
  • Create data visualisations with the Matplotlib library
  • Gain familiarity with Jupyter Notebook and the Markdown language

Notebook Setup

In [1]:
%matplotlib inline

Import libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Select background style

In [3]:
plt.style.use('dark_background')

Data Exploration and Manipulation

Read from CSV File

In [4]:
df = pd.read_csv('data/recycling-rate-by-waste-type.csv')

Display the size of the data

In [5]:
df.shape  # This will display the number of rows and columns
Out[5]:
(224, 3)
In [6]:
df.columns # Displays the names of the columns
Out[6]:
Index(['year', 'waste_type', 'recycling_rate'], dtype='object')
In [7]:
df.dtypes # Displays the name and type of the columns
Out[7]:
year               int64
waste_type        object
recycling_rate     int64
dtype: object

Display the first five rows in the dataset

In [8]:
df.head() # Displays the first five rows
Out[8]:
year waste_type recycling_rate
0 2000 Construction Debris 61
1 2000 Used Slag 45
2 2000 Ferrous Metal 92
3 2000 Scrap Tyres 54
4 2000 Non-Ferrous Metal 84

Display the last five rows in the dataset

In [9]:
df.tail() # Displays the last five rows
Out[9]:
year waste_type recycling_rate
219 2015 Glass 19
220 2015 Food 13
221 2015 Textiles 8
222 2015 Ash and Sludge 13
223 2015 Others 2

Check the dataset for missing values

In [10]:
df.isnull().sum()  # Displays columns with blank values
Out[10]:
year              0
waste_type        0
recycling_rate    0
dtype: int64

Rename columns

In [11]:
df.columns = ['year', 'wastetype', 'recyclingrate']
df.head()
Out[11]:
year wastetype recyclingrate
0 2000 Construction Debris 61
1 2000 Used Slag 45
2 2000 Ferrous Metal 92
3 2000 Scrap Tyres 54
4 2000 Non-Ferrous Metal 84

Summary of the dataset

In [12]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 224 entries, 0 to 223
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   year           224 non-null    int64 
 1   wastetype      224 non-null    object
 2   recyclingrate  224 non-null    int64 
dtypes: int64(2), object(1)
memory usage: 5.4+ KB

List the years covered in the dataset

In [13]:
df["year"].unique()
Out[13]:
array([2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
       2011, 2012, 2013, 2014, 2015])

Displays all unique wastetypes

In [14]:
uniquewastetypes = df["wastetype"].unique()  # This is placed in a variable
uniquewastetypes
Out[14]:
array(['Construction Debris', 'Used Slag', 'Ferrous Metal', 'Scrap Tyres',
       'Non-Ferrous Metal', 'Wood', 'Paper/Cardboard',
       'Horticultural Waste', 'Plastics', 'Glass', 'Food', 'Textiles',
       'Ash and Sludge', 'Others'], dtype=object)

Total number of wastetypes

In [15]:
len(uniquewastetypes)  # Displays the total number of unique wastetypes in the dataset
totalwastetypes = len(uniquewastetypes)
totalwastetypes
Out[15]:
14

Categorise by the year 2015

In [16]:
year2015 = df['year'] == 2015
In [17]:
df[year2015]
Out[17]:
year wastetype recyclingrate
210 2015 Construction Debris 99
211 2015 Used Slag 99
212 2015 Ferrous Metal 99
213 2015 Scrap Tyres 92
214 2015 Non-Ferrous Metal 89
215 2015 Wood 79
216 2015 Paper/Cardboard 51
217 2015 Horticultural Waste 66
218 2015 Plastics 7
219 2015 Glass 19
220 2015 Food 13
221 2015 Textiles 8
222 2015 Ash and Sludge 13
223 2015 Others 2
In [18]:
df['recyclingrate'] = df['recyclingrate'].astype(int) # Ensures that the recyclingrate column is an integer

Data Visualisation

Recycling Rates in 2015

In [19]:
chart1 = df[year2015].plot.barh(y='recyclingrate', x="wastetype", color='pink', figsize=(8,5))
chart1.set_title("Recycling Rates in 2015", fontsize=20)
chart1.set_xlabel("Recycling Rate in %", fontsize=13)
chart1.set_ylabel("Type of Wastes", fontsize=13)
Out[19]:
Text(0, 0.5, 'Type of Wastes')
In [20]:
year2000 = df['year'] == 2000
df[year2000]
Out[20]:
year wastetype recyclingrate
0 2000 Construction Debris 61
1 2000 Used Slag 45
2 2000 Ferrous Metal 92
3 2000 Scrap Tyres 54
4 2000 Non-Ferrous Metal 84
5 2000 Wood 9
6 2000 Paper/Cardboard 43
7 2000 Horticultural Waste 42
8 2000 Plastics 21
9 2000 Glass 14
10 2000 Food 1
11 2000 Textiles 0
12 2000 Ash and Sludge 0
13 2000 Others 1

Recycling Rates in 2000

In [21]:
chart2 = df[year2000].plot.barh(y='recyclingrate', x="wastetype", figsize=(8,4))
chart2.set_title("Recycling Rates in 2000", fontsize=20)
chart2.set_xlabel("Recycling Rate in %", fontsize=13)
chart2.set_ylabel("Type of Wastes", fontsize=13)
Out[21]:
Text(0, 0.5, 'Type of Wastes')

Filter column by food waste

In [22]:
foodwaste = df['wastetype'] == "Food"
In [23]:
df[foodwaste]
Out[23]:
year wastetype recyclingrate
10 2000 Food 1
24 2001 Food 6
38 2002 Food 6
52 2003 Food 6
66 2004 Food 6
80 2005 Food 7
94 2006 Food 8
108 2007 Food 9
122 2008 Food 12
136 2009 Food 13
150 2010 Food 16
164 2011 Food 10
178 2012 Food 12
192 2013 Food 13
206 2014 Food 13
220 2015 Food 13
In [24]:
# df4 = df.pivot[]
# df.pivot(columns='var', values='val')
In [25]:
chart3 = df[foodwaste].plot(x='year', y='recyclingrate', color='lime', figsize=(10,7))
chart3.set_title("Recycling from 2000 to 2015", fontsize=20)
chart3.set_xlabel("Year", fontsize=13)
chart3.set_ylabel("Recycling Rate in %", fontsize=13)
Out[25]:
Text(0, 0.5, 'Recycling Rate in %')