Diabetes is a critical global health priority, as its rising prevalence places an increasing burden on healthcare systems and individual quality of life.
In this analysis, we examine the diabetes prevalence in the top 20 countries and track how the rates changed from 2011 to 2021. By comparing this data with national obesity rates, the project uses exploratory data analysis to highlight the relationship between weight-related metrics and the decade-long growth of the diabetes epidemic.
# Load required packages
library(tidyverse)
library(ggplot2)
# Read dataset
diabetes <- read.csv("~/Desktop/R project/3_Diabetes/Diabetes prevalence.csv")
str(diabetes)
## 'data.frame': 256 obs. of 19 variables:
## $ FREQ : chr "A" "A" "A" "A" ...
## $ FREQ_LABEL : chr "Annual" "Annual" "Annual" "Annual" ...
## $ REF_AREA : chr "ABW" "AFE" "AFG" "AFW" ...
## $ REF_AREA_LABEL : chr "Aruba" "Africa Eastern and Southern" "Afghanistan" "Africa Western and Central" ...
## $ INDICATOR : chr "WB_HNP_SH_STA_DIAB_ZS" "WB_HNP_SH_STA_DIAB_ZS" "WB_HNP_SH_STA_DIAB_ZS" "WB_HNP_SH_STA_DIAB_ZS" ...
## $ INDICATOR_LABEL : chr "Diabetes prevalence (% of population ages 20 to 79)" "Diabetes prevalence (% of population ages 20 to 79)" "Diabetes prevalence (% of population ages 20 to 79)" "Diabetes prevalence (% of population ages 20 to 79)" ...
## $ UNIT_MEASURE : chr "PT" "PT" "PT" "PT" ...
## $ UNIT_MEASURE_LABEL: chr "Percentage" "Percentage" "Percentage" "Percentage" ...
## $ DATABASE_ID : chr "WB_HNP" "WB_HNP" "WB_HNP" "WB_HNP" ...
## $ DATABASE_ID_LABEL : chr "Health Nutrition and Population Statistics" "Health Nutrition and Population Statistics" "Health Nutrition and Population Statistics" "Health Nutrition and Population Statistics" ...
## $ UNIT_MULT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ UNIT_MULT_LABEL : chr "Units" "Units" "Units" "Units" ...
## $ OBS_STATUS : chr "A" "A" "A" "A" ...
## $ OBS_STATUS_LABEL : chr "Normal value" "Normal value" "Normal value" "Normal value" ...
## $ OBS_CONF : chr "PU" "PU" "PU" "PU" ...
## $ OBS_CONF_LABEL : chr "Public" "Public" "Public" "Public" ...
## $ X2000 : num 12.1 NA NA NA NA NA NA NA NA NA ...
## $ X2011 : num 12.4 4.59 7.6 4.41 2.9 ...
## $ X2021 : num 4.3 7.38 10.9 3.39 4.6 ...
obesity <- read.csv("~/Desktop/R project/3_Diabetes/Obesity prevalance.csv")
str(obesity)
## 'data.frame': 398 obs. of 34 variables:
## $ IndicatorCode : chr "NCD_BMI_30C" "NCD_BMI_30C" "NCD_BMI_30C" "NCD_BMI_30C" ...
## $ Indicator : chr "Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)" "Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)" "Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)" "Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)" ...
## $ ValueType : chr "numeric" "numeric" "numeric" "numeric" ...
## $ ParentLocationCode : chr "WPR" "SEAR" "AFR" "AFR" ...
## $ ParentLocation : chr "Western Pacific" "South-East Asia" "Africa" "Africa" ...
## $ Location.type : chr "Country" "Country" "Country" "Country" ...
## $ SpatialDimValueCode : chr "VNM" "LKA" "AGO" "CIV" ...
## $ Location : chr "Viet Nam" "Sri Lanka" "Angola" "Cote d'Ivoire" ...
## $ Period.type : chr "Year" "Year" "Year" "Year" ...
## $ Period : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ IsLatestYear : chr "false" "false" "false" "false" ...
## $ Dim1.type : chr "Sex" "Sex" "Sex" "Sex" ...
## $ Dim1 : chr "Both sexes" "Both sexes" "Both sexes" "Both sexes" ...
## $ Dim1ValueCode : chr "SEX_BTSX" "SEX_BTSX" "SEX_BTSX" "SEX_BTSX" ...
## $ Dim2.type : chr "Age Group" "Age Group" "Age Group" "Age Group" ...
## $ Dim2 : chr "18+ years" "18+ years" "18+ years" "18+ years" ...
## $ Dim2ValueCode : chr "AGEGROUP_YEARS18-PLUS" "AGEGROUP_YEARS18-PLUS" "AGEGROUP_YEARS18-PLUS" "AGEGROUP_YEARS18-PLUS" ...
## $ Dim3.type : logi NA NA NA NA NA NA ...
## $ Dim3 : logi NA NA NA NA NA NA ...
## $ Dim3ValueCode : logi NA NA NA NA NA NA ...
## $ DataSourceDimValueCode : logi NA NA NA NA NA NA ...
## $ DataSource : logi NA NA NA NA NA NA ...
## $ FactValueNumericPrefix : logi NA NA NA NA NA NA ...
## $ FactValueNumeric : num 1.91 10.04 10.19 10.17 10.34 ...
## $ FactValueUoM : logi NA NA NA NA NA NA ...
## $ FactValueNumericLowPrefix : logi NA NA NA NA NA NA ...
## $ FactValueNumericLow : num 1.48 8.62 6.76 8.18 5.57 5.46 8.89 9.36 8.9 9.33 ...
## $ FactValueNumericHighPrefix: logi NA NA NA NA NA NA ...
## $ FactValueNumericHigh : num 2.42 11.57 14.26 12.6 16.65 ...
## $ Value : chr "1.9 [1.5-2.4]" "10.0 [8.6-11.6]" "10.2 [6.8-14.3]" "10.2 [8.2-12.6]" ...
## $ FactValueTranslationID : logi NA NA NA NA NA NA ...
## $ FactComments : logi NA NA NA NA NA NA ...
## $ Language : chr "EN" "EN" "EN" "EN" ...
## $ DateModified : chr "2024-02-28T14:00:00.000Z" "2024-02-28T14:00:00.000Z" "2024-02-28T14:00:00.000Z" "2024-02-28T14:00:00.000Z" ...
head(diabetes)
## FREQ FREQ_LABEL REF_AREA REF_AREA_LABEL INDICATOR
## 1 A Annual ABW Aruba WB_HNP_SH_STA_DIAB_ZS
## 2 A Annual AFE Africa Eastern and Southern WB_HNP_SH_STA_DIAB_ZS
## 3 A Annual AFG Afghanistan WB_HNP_SH_STA_DIAB_ZS
## 4 A Annual AFW Africa Western and Central WB_HNP_SH_STA_DIAB_ZS
## 5 A Annual AGO Angola WB_HNP_SH_STA_DIAB_ZS
## 6 A Annual ALB Albania WB_HNP_SH_STA_DIAB_ZS
## INDICATOR_LABEL UNIT_MEASURE
## 1 Diabetes prevalence (% of population ages 20 to 79) PT
## 2 Diabetes prevalence (% of population ages 20 to 79) PT
## 3 Diabetes prevalence (% of population ages 20 to 79) PT
## 4 Diabetes prevalence (% of population ages 20 to 79) PT
## 5 Diabetes prevalence (% of population ages 20 to 79) PT
## 6 Diabetes prevalence (% of population ages 20 to 79) PT
## UNIT_MEASURE_LABEL DATABASE_ID DATABASE_ID_LABEL
## 1 Percentage WB_HNP Health Nutrition and Population Statistics
## 2 Percentage WB_HNP Health Nutrition and Population Statistics
## 3 Percentage WB_HNP Health Nutrition and Population Statistics
## 4 Percentage WB_HNP Health Nutrition and Population Statistics
## 5 Percentage WB_HNP Health Nutrition and Population Statistics
## 6 Percentage WB_HNP Health Nutrition and Population Statistics
## UNIT_MULT UNIT_MULT_LABEL OBS_STATUS OBS_STATUS_LABEL OBS_CONF OBS_CONF_LABEL
## 1 0 Units A Normal value PU Public
## 2 0 Units A Normal value PU Public
## 3 0 Units A Normal value PU Public
## 4 0 Units A Normal value PU Public
## 5 0 Units A Normal value PU Public
## 6 0 Units A Normal value PU Public
## X2000 X2011 X2021
## 1 12.1 12.400000 4.300000
## 2 NA 4.587181 7.381941
## 3 NA 7.600000 10.900000
## 4 NA 4.412739 3.389805
## 5 NA 2.900000 4.600000
## 6 NA 2.800000 10.200000
str(diabetes)
## 'data.frame': 256 obs. of 19 variables:
## $ FREQ : chr "A" "A" "A" "A" ...
## $ FREQ_LABEL : chr "Annual" "Annual" "Annual" "Annual" ...
## $ REF_AREA : chr "ABW" "AFE" "AFG" "AFW" ...
## $ REF_AREA_LABEL : chr "Aruba" "Africa Eastern and Southern" "Afghanistan" "Africa Western and Central" ...
## $ INDICATOR : chr "WB_HNP_SH_STA_DIAB_ZS" "WB_HNP_SH_STA_DIAB_ZS" "WB_HNP_SH_STA_DIAB_ZS" "WB_HNP_SH_STA_DIAB_ZS" ...
## $ INDICATOR_LABEL : chr "Diabetes prevalence (% of population ages 20 to 79)" "Diabetes prevalence (% of population ages 20 to 79)" "Diabetes prevalence (% of population ages 20 to 79)" "Diabetes prevalence (% of population ages 20 to 79)" ...
## $ UNIT_MEASURE : chr "PT" "PT" "PT" "PT" ...
## $ UNIT_MEASURE_LABEL: chr "Percentage" "Percentage" "Percentage" "Percentage" ...
## $ DATABASE_ID : chr "WB_HNP" "WB_HNP" "WB_HNP" "WB_HNP" ...
## $ DATABASE_ID_LABEL : chr "Health Nutrition and Population Statistics" "Health Nutrition and Population Statistics" "Health Nutrition and Population Statistics" "Health Nutrition and Population Statistics" ...
## $ UNIT_MULT : int 0 0 0 0 0 0 0 0 0 0 ...
## $ UNIT_MULT_LABEL : chr "Units" "Units" "Units" "Units" ...
## $ OBS_STATUS : chr "A" "A" "A" "A" ...
## $ OBS_STATUS_LABEL : chr "Normal value" "Normal value" "Normal value" "Normal value" ...
## $ OBS_CONF : chr "PU" "PU" "PU" "PU" ...
## $ OBS_CONF_LABEL : chr "Public" "Public" "Public" "Public" ...
## $ X2000 : num 12.1 NA NA NA NA NA NA NA NA NA ...
## $ X2011 : num 12.4 4.59 7.6 4.41 2.9 ...
## $ X2021 : num 4.3 7.38 10.9 3.39 4.6 ...
summary(diabetes)
## FREQ FREQ_LABEL REF_AREA REF_AREA_LABEL
## Length:256 Length:256 Length:256 Length:256
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## INDICATOR INDICATOR_LABEL UNIT_MEASURE UNIT_MEASURE_LABEL
## Length:256 Length:256 Length:256 Length:256
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## DATABASE_ID DATABASE_ID_LABEL UNIT_MULT UNIT_MULT_LABEL
## Length:256 Length:256 Min. :0 Length:256
## Class :character Class :character 1st Qu.:0 Class :character
## Mode :character Mode :character Median :0 Mode :character
## Mean :0
## 3rd Qu.:0
## Max. :0
##
## OBS_STATUS OBS_STATUS_LABEL OBS_CONF OBS_CONF_LABEL
## Length:256 Length:256 Length:256 Length:256
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## X2000 X2011 X2021
## Min. : 0.00 Min. : 1.900 Min. : 1.100
## 1st Qu.:11.88 1st Qu.: 5.200 1st Qu.: 5.800
## Median :12.10 Median : 7.601 Median : 7.900
## Mean :11.47 Mean : 8.013 Mean : 9.068
## 3rd Qu.:13.88 3rd Qu.: 9.500 3rd Qu.:10.983
## Max. :15.50 Max. :25.300 Max. :30.800
## NA's :238 NA's :7
colSums(is.na(diabetes))
## FREQ FREQ_LABEL REF_AREA REF_AREA_LABEL
## 0 0 0 0
## INDICATOR INDICATOR_LABEL UNIT_MEASURE UNIT_MEASURE_LABEL
## 0 0 0 0
## DATABASE_ID DATABASE_ID_LABEL UNIT_MULT UNIT_MULT_LABEL
## 0 0 0 0
## OBS_STATUS OBS_STATUS_LABEL OBS_CONF OBS_CONF_LABEL
## 0 0 0 0
## X2000 X2011 X2021
## 238 7 0
head(obesity)
## IndicatorCode
## 1 NCD_BMI_30C
## 2 NCD_BMI_30C
## 3 NCD_BMI_30C
## 4 NCD_BMI_30C
## 5 NCD_BMI_30C
## 6 NCD_BMI_30C
## Indicator
## 1 Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)
## 2 Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)
## 3 Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)
## 4 Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)
## 5 Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)
## 6 Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)
## ValueType ParentLocationCode ParentLocation Location.type
## 1 numeric WPR Western Pacific Country
## 2 numeric SEAR South-East Asia Country
## 3 numeric AFR Africa Country
## 4 numeric AFR Africa Country
## 5 numeric EMR Eastern Mediterranean Country
## 6 numeric SEAR South-East Asia Country
## SpatialDimValueCode Location Period.type Period
## 1 VNM Viet Nam Year 2021
## 2 LKA Sri Lanka Year 2021
## 3 AGO Angola Year 2021
## 4 CIV Cote d'Ivoire Year 2021
## 5 DJI Djibouti Year 2021
## 6 PRK Democratic People's Republic of Korea Year 2021
## IsLatestYear Dim1.type Dim1 Dim1ValueCode Dim2.type Dim2
## 1 false Sex Both sexes SEX_BTSX Age Group 18+ years
## 2 false Sex Both sexes SEX_BTSX Age Group 18+ years
## 3 false Sex Both sexes SEX_BTSX Age Group 18+ years
## 4 false Sex Both sexes SEX_BTSX Age Group 18+ years
## 5 false Sex Both sexes SEX_BTSX Age Group 18+ years
## 6 false Sex Both sexes SEX_BTSX Age Group 18+ years
## Dim2ValueCode Dim3.type Dim3 Dim3ValueCode DataSourceDimValueCode
## 1 AGEGROUP_YEARS18-PLUS NA NA NA NA
## 2 AGEGROUP_YEARS18-PLUS NA NA NA NA
## 3 AGEGROUP_YEARS18-PLUS NA NA NA NA
## 4 AGEGROUP_YEARS18-PLUS NA NA NA NA
## 5 AGEGROUP_YEARS18-PLUS NA NA NA NA
## 6 AGEGROUP_YEARS18-PLUS NA NA NA NA
## DataSource FactValueNumericPrefix FactValueNumeric FactValueUoM
## 1 NA NA 1.91 NA
## 2 NA NA 10.04 NA
## 3 NA NA 10.19 NA
## 4 NA NA 10.17 NA
## 5 NA NA 10.34 NA
## 6 NA NA 10.38 NA
## FactValueNumericLowPrefix FactValueNumericLow FactValueNumericHighPrefix
## 1 NA 1.48 NA
## 2 NA 8.62 NA
## 3 NA 6.76 NA
## 4 NA 8.18 NA
## 5 NA 5.57 NA
## 6 NA 5.46 NA
## FactValueNumericHigh Value FactValueTranslationID FactComments
## 1 2.42 1.9 [1.5-2.4] NA NA
## 2 11.57 10.0 [8.6-11.6] NA NA
## 3 14.26 10.2 [6.8-14.3] NA NA
## 4 12.60 10.2 [8.2-12.6] NA NA
## 5 16.65 10.3 [5.6-16.7] NA NA
## 6 16.98 10.4 [5.5-17.0] NA NA
## Language DateModified
## 1 EN 2024-02-28T14:00:00.000Z
## 2 EN 2024-02-28T14:00:00.000Z
## 3 EN 2024-02-28T14:00:00.000Z
## 4 EN 2024-02-28T14:00:00.000Z
## 5 EN 2024-02-28T14:00:00.000Z
## 6 EN 2024-02-28T14:00:00.000Z
str(obesity)
## 'data.frame': 398 obs. of 34 variables:
## $ IndicatorCode : chr "NCD_BMI_30C" "NCD_BMI_30C" "NCD_BMI_30C" "NCD_BMI_30C" ...
## $ Indicator : chr "Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)" "Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)" "Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)" "Prevalence of obesity among adults, BMI ≥ 30 (crude estimate) (%)" ...
## $ ValueType : chr "numeric" "numeric" "numeric" "numeric" ...
## $ ParentLocationCode : chr "WPR" "SEAR" "AFR" "AFR" ...
## $ ParentLocation : chr "Western Pacific" "South-East Asia" "Africa" "Africa" ...
## $ Location.type : chr "Country" "Country" "Country" "Country" ...
## $ SpatialDimValueCode : chr "VNM" "LKA" "AGO" "CIV" ...
## $ Location : chr "Viet Nam" "Sri Lanka" "Angola" "Cote d'Ivoire" ...
## $ Period.type : chr "Year" "Year" "Year" "Year" ...
## $ Period : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ IsLatestYear : chr "false" "false" "false" "false" ...
## $ Dim1.type : chr "Sex" "Sex" "Sex" "Sex" ...
## $ Dim1 : chr "Both sexes" "Both sexes" "Both sexes" "Both sexes" ...
## $ Dim1ValueCode : chr "SEX_BTSX" "SEX_BTSX" "SEX_BTSX" "SEX_BTSX" ...
## $ Dim2.type : chr "Age Group" "Age Group" "Age Group" "Age Group" ...
## $ Dim2 : chr "18+ years" "18+ years" "18+ years" "18+ years" ...
## $ Dim2ValueCode : chr "AGEGROUP_YEARS18-PLUS" "AGEGROUP_YEARS18-PLUS" "AGEGROUP_YEARS18-PLUS" "AGEGROUP_YEARS18-PLUS" ...
## $ Dim3.type : logi NA NA NA NA NA NA ...
## $ Dim3 : logi NA NA NA NA NA NA ...
## $ Dim3ValueCode : logi NA NA NA NA NA NA ...
## $ DataSourceDimValueCode : logi NA NA NA NA NA NA ...
## $ DataSource : logi NA NA NA NA NA NA ...
## $ FactValueNumericPrefix : logi NA NA NA NA NA NA ...
## $ FactValueNumeric : num 1.91 10.04 10.19 10.17 10.34 ...
## $ FactValueUoM : logi NA NA NA NA NA NA ...
## $ FactValueNumericLowPrefix : logi NA NA NA NA NA NA ...
## $ FactValueNumericLow : num 1.48 8.62 6.76 8.18 5.57 5.46 8.89 9.36 8.9 9.33 ...
## $ FactValueNumericHighPrefix: logi NA NA NA NA NA NA ...
## $ FactValueNumericHigh : num 2.42 11.57 14.26 12.6 16.65 ...
## $ Value : chr "1.9 [1.5-2.4]" "10.0 [8.6-11.6]" "10.2 [6.8-14.3]" "10.2 [8.2-12.6]" ...
## $ FactValueTranslationID : logi NA NA NA NA NA NA ...
## $ FactComments : logi NA NA NA NA NA NA ...
## $ Language : chr "EN" "EN" "EN" "EN" ...
## $ DateModified : chr "2024-02-28T14:00:00.000Z" "2024-02-28T14:00:00.000Z" "2024-02-28T14:00:00.000Z" "2024-02-28T14:00:00.000Z" ...
summary(obesity)
## IndicatorCode Indicator ValueType ParentLocationCode
## Length:398 Length:398 Length:398 Length:398
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## ParentLocation Location.type SpatialDimValueCode Location
## Length:398 Length:398 Length:398 Length:398
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Period.type Period IsLatestYear Dim1.type
## Length:398 Min. :2011 Length:398 Length:398
## Class :character 1st Qu.:2011 Class :character Class :character
## Mode :character Median :2016 Mode :character Mode :character
## Mean :2016
## 3rd Qu.:2021
## Max. :2021
## Dim1 Dim1ValueCode Dim2.type Dim2
## Length:398 Length:398 Length:398 Length:398
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Dim2ValueCode Dim3.type Dim3 Dim3ValueCode
## Length:398 Mode:logical Mode:logical Mode:logical
## Class :character NA's:398 NA's:398 NA's:398
## Mode :character
##
##
##
## DataSourceDimValueCode DataSource FactValueNumericPrefix FactValueNumeric
## Mode:logical Mode:logical Mode:logical Min. : 0.74
## NA's:398 NA's:398 NA's:398 1st Qu.:10.23
## Median :21.08
## Mean :21.80
## 3rd Qu.:28.45
## Max. :75.30
## FactValueUoM FactValueNumericLowPrefix FactValueNumericLow
## Mode:logical Mode:logical Min. : 0.650
## NA's:398 NA's:398 1st Qu.: 8.613
## Median :18.550
## Mean :19.366
## 3rd Qu.:25.085
## Max. :68.860
## FactValueNumericHighPrefix FactValueNumericHigh Value
## Mode:logical Min. : 0.84 Length:398
## NA's:398 1st Qu.:12.31 Class :character
## Median :23.84 Mode :character
## Mean :24.38
## 3rd Qu.:32.59
## Max. :80.99
## FactValueTranslationID FactComments Language DateModified
## Mode:logical Mode:logical Length:398 Length:398
## NA's:398 NA's:398 Class :character Class :character
## Mode :character Mode :character
##
##
##
colSums(is.na(obesity))
## IndicatorCode Indicator
## 0 0
## ValueType ParentLocationCode
## 0 0
## ParentLocation Location.type
## 0 0
## SpatialDimValueCode Location
## 0 0
## Period.type Period
## 0 0
## IsLatestYear Dim1.type
## 0 0
## Dim1 Dim1ValueCode
## 0 0
## Dim2.type Dim2
## 0 0
## Dim2ValueCode Dim3.type
## 0 398
## Dim3 Dim3ValueCode
## 398 398
## DataSourceDimValueCode DataSource
## 398 398
## FactValueNumericPrefix FactValueNumeric
## 398 0
## FactValueUoM FactValueNumericLowPrefix
## 398 398
## FactValueNumericLow FactValueNumericHighPrefix
## 0 398
## FactValueNumericHigh Value
## 0 0
## FactValueTranslationID FactComments
## 398 398
## Language DateModified
## 0 0
diabetes_1 <- diabetes %>%
select(REF_AREA, REF_AREA_LABEL, X2011, X2021) %>%
mutate(diabetes_change = X2021 - X2011) %>%
rename(Location = REF_AREA_LABEL,
diabetes_2021 = X2021,
diabetes_2011 = X2011,
code = REF_AREA)
obesity_1 <- obesity %>%
select(SpatialDimValueCode, Location, Period, Value) %>%
pivot_wider(
names_from = Period,
values_from = Value
) %>%
rename(obesity_2021 = "2021",
obesity_2011 = "2011",
code = SpatialDimValueCode) %>%
mutate(
obesity_2011 = as.numeric(str_extract(obesity_2011, "^[0-9.]+")),
obesity_2021 = as.numeric(str_extract(obesity_2021, "^[0-9.]+")),
obesity_change = obesity_2021 - obesity_2011
)
diabetes_summary_avg <- diabetes_1 %>%
summarise(
dm_mean_2011 = mean(diabetes_2011, na.rm = TRUE),
dm_mean_2021 = mean(diabetes_2021, na.rm = TRUE),
dm_mean_change = mean(diabetes_change, na.rm = TRUE)
)
print(diabetes_summary_avg)
## dm_mean_2011 dm_mean_2021 dm_mean_change
## 1 8.012563 9.067681 1.007623
The global mean diabetes prevalence rose from 8.01% in 2011 to 9.07% in 2021.
diabetes_long <- diabetes_1 %>%
select(Location, diabetes_2011, diabetes_2021) %>%
pivot_longer(cols = starts_with("diabetes"),
names_to = "Year",
values_to = "Prevalence") %>%
mutate(Year = ifelse(Year == "diabetes_2011", "2011", "2021")) %>%
filter(!is.na(Prevalence))
ggplot(diabetes_long, aes(Year, Prevalence)) +
geom_boxplot(fill = "skyblue") +
labs(title = "Diabetes Prevalence (2011 vs. 2021)", y = "Prevalence (%)", x = NULL) +
theme_minimal()
The boxplot shows a clear upward shift in global diabetes prevalence from 2011 to 2021, with both the median and the overall distribution reaching higher levels. The increased number of outliers in 2021 highlights that more countries are facing extreme prevalence rates, some now exceeding 30%
top20_DM_2021 <- diabetes_1 %>%
arrange(desc(diabetes_2021)) %>%
head(20)
ggplot(top20_DM_2021, aes(x = reorder(Location, diabetes_2021), y = diabetes_2021)) +
geom_col(fill = "skyblue") +
geom_text(aes(label = round(diabetes_2021,1)),
hjust = -0.1) +
labs(title = "Top 20 Countries by Diabetes Prevalence in 2021",
subtitle = "Prevalence among adults aged 20–79",
x = NULL, y = "Prevalance(%)") +
coord_flip() +
theme_minimal()
Pakistan leads with a 30.8% prevalence, significantly outperforming other nations on the list. The top 20 is heavily concentrated in Pacific Island nations and the Middle East.
diabetes_1 %>%
filter(!is.na(diabetes_change)) %>%
ggplot(aes(y = diabetes_change)) +
geom_boxplot(fill = "skyblue") +
labs(title = "Distribution of Prevalence Change (2011-2021)",
y = "Change in Prevalence (%)", x = NULL) +
theme_minimal() +
theme(axis.text.x = element_blank())
The prevalence change boxplot shows that most countries experienced a modest increase, while several extreme outliers saw spikes of over 20 percentage points.
top20_DMchange <- diabetes_1 %>%
arrange(desc(abs(diabetes_change))) %>%
head(20)
ggplot(top20_DMchange, aes(x = reorder(Location, abs(diabetes_change)), y = diabetes_change)) +
geom_col(fill = "skyblue") +
geom_text(aes(label = diabetes_change), hjust = ifelse(top20_DMchange$diabetes_change > 0, -0.1, 1.1)) +
coord_flip() +
labs(
title = "Top 20 Countries by Diabetes Prevalence Change (2011–2021)",
subtitle = "Prevalence among adults aged 20–79",
y = "Prevalence Change (%)",
x = NULL
) +
theme_minimal()
While Pakistan shows a staggering 22.9% increase, the chart also reveals significant decreases in countries like Lebanon and Bahrain.
obesity_summary_avg <- obesity_1 %>%
summarise(
obesity_mean_2011 = mean(obesity_2011, na.rm = TRUE),
obesity_2021 = mean(obesity_2021, na.rm = TRUE),
obesity_change = mean(obesity_change, na.rm = TRUE)
)
print(obesity_summary_avg)
## # A tibble: 1 × 3
## obesity_mean_2011 obesity_2021 obesity_change
## <dbl> <dbl> <dbl>
## 1 19.4 24.2 4.72
The obesity rate saw a significant 4.8 percentage point increase over ten years, growing from 19.4% to 24.2%.
obesity_long <- obesity_1 %>%
pivot_longer(
cols = c(obesity_2011, obesity_2021),
names_to = "Year",
values_to = "Prevalence"
) %>%
mutate(Year = str_replace(Year, "obesity_", "")) %>%
filter(!is.na(Prevalence))
ggplot(obesity_long, aes(Year, Prevalence)) +
geom_boxplot(fill = "skyblue") +
labs(title = "Obesity Prevalence (2011 vs. 2021)", y = "Prevalence (%)", x = NULL) +
theme_minimal()
The boxplot shows a clear upward shift in global obesity rates from 2011 to 2021. Both the median and overall distribution have moved higher.
obesity_1 %>%
filter(!is.na(obesity_change)) %>%
ggplot(aes(y = obesity_change)) +
geom_boxplot(fill = "skyblue") +
labs(title = "Distribution of Prevalence Change (2011-2021)",
y = "Change in Prevalence (%)", x = NULL) +
theme_minimal() +
theme(axis.text.x = element_blank())
The median increase in obesity is around 4.7%, showing a universal upward trend. Most regions experienced a growth of 3% to 7% over the decade.
top20_Obesitychange <- obesity_1 %>%
arrange(desc(abs(obesity_change))) %>%
head(20)
ggplot(top20_Obesitychange, aes(x = reorder(Location, abs(obesity_change)), y = obesity_change)) +
geom_col(fill = "skyblue") +
geom_text(aes(label = obesity_change), hjust = ifelse(top20_Obesitychange$obesity_change > 0, -0.1, 1.1)) +
coord_flip() +
labs (
title = "Top 20 Countries by Obesity Prevalance Change (2011–2021)",
subtitle = "Prevalence among adults (18 years and older)",
y = "Prevalence Change (%)",
x = NULL
) +
theme_minimal()
Romania recorded the largest increase in obesity prevalence, rising by over 13%. Several other nations, including Uzbekistan and Pakistan, also saw significant growth exceeding 9%.
data <-obesity_1 %>%
left_join(diabetes_1, by = "code")
data <- data %>%
select(
code,
Country = Location.x,
obesity_2011, obesity_2021, obesity_change,
diabetes_2011, diabetes_2021, diabetes_change
)
nrow(obesity_1)
## [1] 199
nrow(data)
## [1] 199
colSums(is.na(data))
## code Country obesity_2011 obesity_2021 obesity_change
## 0 0 0 0 0
## diabetes_2011 diabetes_2021 diabetes_change
## 6 3 6
data %>%
summarise(
avg_obesity_2021 = mean(obesity_2021, na.rm = TRUE),
avg_diabetes_2021 = mean(diabetes_2021, na.rm = TRUE)
)
## # A tibble: 1 × 2
## avg_obesity_2021 avg_diabetes_2021
## <dbl> <dbl>
## 1 24.2 8.89
missing_list <- data %>%
filter(is.na(diabetes_change)) %>%
select(code, Country)
print(missing_list)
## # A tibble: 6 × 2
## code Country
## <chr> <chr>
## 1 GRL Greenland
## 2 NIU Niue
## 3 COK Cook Islands
## 4 TKL Tokelau
## 5 SSD South Sudan
## 6 ASM American Samoa
data_clean <- data %>%
drop_na(obesity_change, diabetes_change)
cor.test(data_clean$diabetes_2021, data_clean$obesity_2021)
##
## Pearson's product-moment correlation
##
## data: data_clean$diabetes_2021 and data_clean$obesity_2021
## t = 8.3553, df = 191, p-value = 1.32e-14
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4057778 0.6137577
## sample estimates:
## cor
## 0.5173666
model_2021 <- lm(diabetes_2021 ~ obesity_2021, data = data_clean)
summary(model_2021)
##
## Call:
## lm(formula = diabetes_2021 ~ obesity_2021, data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.3721 -3.1511 -0.9643 2.1914 22.4557
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.07565 0.65743 6.199 3.42e-09 ***
## obesity_2021 0.20621 0.02468 8.355 1.32e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.436 on 191 degrees of freedom
## Multiple R-squared: 0.2677, Adjusted R-squared: 0.2638
## F-statistic: 69.81 on 1 and 191 DF, p-value: 1.32e-14
ggplot(data_clean, aes(x = obesity_2021, y = diabetes_2021)) +
geom_point() +
labs(title = "Relationship between Obesity and Diabetes Prevalence (2021)",
x = "Obesity prevalence",
y = "Diabetes prevalence"
) +
geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
The upward-sloping regression line shows a positive correlation between obesity and diabetes in 2021. However, significant outliers indicate that other factors also influence diabetes rates.
cor.test(data_clean$obesity_change, data_clean$diabetes_change)
##
## Pearson's product-moment correlation
##
## data: data_clean$obesity_change and data_clean$diabetes_change
## t = 0.24801, df = 191, p-value = 0.8044
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1236110 0.1587801
## sample estimates:
## cor
## 0.01794237
model_change <- lm(diabetes_change ~ obesity_change, data = data_clean)
summary(model_change)
##
## Call:
## lm(formula = diabetes_change ~ obesity_change, data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.4930 -1.9550 -0.4967 1.7856 21.9030
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.75102 0.55071 1.364 0.174
## obesity_change 0.02536 0.10225 0.248 0.804
##
## Residual standard error: 3.657 on 191 degrees of freedom
## Multiple R-squared: 0.0003219, Adjusted R-squared: -0.004912
## F-statistic: 0.06151 on 1 and 191 DF, p-value: 0.8044
ggplot(data_clean, aes(x = obesity_change, y = diabetes_change)) +
geom_point() +
labs(title = "Relationship between Obesity and Diabetes Prevalence Change (2011-2021)",
x = "Obesity prevalence change",
y = "Diabetes prevalence change"
) +
geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
The flat regression line indicates a weak correlation between 10-year changes in obesity and diabetes. This suggests that national diabetes trends are driven by factors beyond just obesity growth.
data_clean <- data_clean %>%
mutate(res_2021 = residuals(model_2021))
top_positive_res <- data_clean %>%
arrange(desc(res_2021)) %>%
head(5) %>%
select(Country, obesity_2021, diabetes_2021, res_2021)
print(top_positive_res)
## # A tibble: 5 × 4
## Country obesity_2021 diabetes_2021 res_2021
## <chr> <dbl> <dbl> <dbl>
## 1 Pakistan 20.7 30.8 22.5
## 2 Mauritius 19.1 22.6 14.6
## 3 Sudan 14.8 18.9 11.8
## 4 Kuwait 45 24.9 11.5
## 5 Solomon Islands 21.2 19.8 11.4
These nations show significantly higher diabetes rates than predicted by their obesity levels. Pakistan shows the highest positive residual, with diabetes rates soaring far beyond what its obesity levels would suggest.
top_negative_res <- data_clean %>%
arrange(res_2021) %>%
head(5) %>%
select(Country, obesity_2021, diabetes_2021, res_2021)
print(top_negative_res)
## # A tibble: 5 × 4
## Country obesity_2021 diabetes_2021 res_2021
## <chr> <dbl> <dbl> <dbl>
## 1 Samoa 60.6 9.2 -7.37
## 2 Ireland 30.4 3 -7.34
## 3 Croatia 34.7 4.8 -6.43
## 4 Georgia 38 5.7 -6.21
## 5 Mauritania 20.2 2.1 -6.14
These nations show significantly lower diabetes rates than predicted by their obesity levels. Samoa and Ireland exhibit the largest negative residuals.
Our analysis confirms that the rising prevalence of diabetes and obesity is indeed a significant global burden. However, the intensity of this burden varies considerably across the countries studied.
By tracking data from 2011 to 2021, we found that while national obesity rates are strong indicators of current diabetes prevalence, they do not fully explain the decade-long growth observed in every nation. This suggests that the diabetes epidemic is driven by a complex interplay of factors, including genetics, dietary habits, and socioeconomic conditions.