This project analyzes a Diabetes Dataset using SQL and Tableau. The goal is to uncover insights about diabetes risk factors, patient demographics, and health indicators such as HbA1c levels, BMI, Cholesterol, and Age groups. The analysis helps in understanding trends in diabetes and visualizing key patterns for better decision-making.
The dataset contains medical records with attributes such as:
- AGE - Patient's age
- GENDER - Male/Female
- HbA1c - Blood sugar levels (important for diabetes diagnosis)
- BMI - Body Mass Index
- Cholesterol, LDL, HDL - Lipid profile indicators
- CLASS - Diabetes classification (
Y
= Diabetic,N
= Non-Diabetic,P
= Pre-Diabetic)
Before performing analysis, we first cleaned the dataset using SQL:
To check for missing values in the dataset:
SELECT * FROM DiabetesDataset
WHERE AGE IS NULL OR GENDER IS NULL OR HbA1c IS NULL OR BMI IS NULL;
To replace NULL values with appropriate defaults:
UPDATE DiabetesDataset
SET BMI = (SELECT AVG(BMI) FROM DiabetesDataset WHERE BMI IS NOT NULL)
WHERE BMI IS NULL;
To replace inconsistent gender values (F
, f
β F
, M
, m
β M
):
UPDATE DiabetesDataset
SET GENDER = 'F' WHERE GENDER = 'f';
UPDATE DiabetesDataset
SET GENDER = 'M' WHERE GENDER = 'm';
Check for duplicates:
SELECT ID, COUNT(*)
FROM DiabetesDataset
GROUP BY ID
HAVING COUNT(*) > 1;
Remove duplicates while keeping the first occurrence:
DELETE FROM DiabetesDataset
WHERE ID NOT IN (
SELECT MIN(ID)
FROM DiabetesDataset
GROUP BY No_Pation, AGE, GENDER, HbA1c, BMI
);
- Used
COUNT()
,AVG()
, andGROUP BY
to understand the dataset. - Analyzed age distribution and average health indicators.
- Identified correlations between HbA1c, BMI, and Cholesterol levels.
SELECT CLASS, COUNT(*) AS total_patients
FROM DiabetesDataset
GROUP BY CLASS;
SELECT
CASE
WHEN AGE <= 20 THEN '0-20'
WHEN AGE <= 40 THEN '21-40'
WHEN AGE <= 60 THEN '41-60'
ELSE '61-80'
END AS AgeGroup,
AVG(HbA1c) AS Avg_HbA1c
FROM DiabetesDataset
GROUP BY AgeGroup;
SELECT ID, AGE, BMI, HbA1c
FROM DiabetesDataset
WHERE HbA1c > 6.5 AND BMI > 30
ORDER BY HbA1c DESC;
We created interactive dashboards in Tableau to visualize key insights. Some of the most important visualizations include:
- A histogram showing the distribution of ages among patients.
- A scatter plot identifying relationships between HbA1c levels and BMI, helping to detect high-risk individuals.
- A bar chart showing the proportion of male vs. female patients.
- A line chart tracking trends in cholesterol levels with age.
- A pie chart breaking down diabetic, pre-diabetic, and non-diabetic patients.
View the interactive Tableau dashboard here: Diabetes Analysis Tableau Dashboard
-
SQL Analysis
- Import the dataset into PostgreSQL, MySQL, or any SQL database.
- Run the SQL scripts provided in the
SQLL.SQL
file.
-
Tableau Visualization
- Open Tableau Public/Desktop.
- Load the dataset (
Dataset of Diabetes CSV
). - Import the pre-built Tableau workbook (
Diabetes.twb
).
- Patients with high HbA1c and BMI are more likely to be diabetic.
- The majority of patients fall within the 40-60 age group.
- Gender-based patterns in diabetes prevalence can be explored further.
- Cholesterol levels increase with age, and LDL trends correlate with diabetes risk.
Feel free to contribute by improving the analysis, adding visualizations, or refining the queries! Open a pull request if you have any suggestions.
For any questions or discussions, reach out via GitHub Issues or email.