This Jupyter notebook provides a comprehensive exploration of a dataset related to Polycystic Ovary Syndrome (PCOS). The analysis includes importing necessary libraries, loading the dataset, examining descriptive statistics, handling missing values, and exploring distributions through various plots and statistical tests. The focus is on understanding the relationships between different variables and their impact on PCOS.
-
Importing Libraries and Loading Data
- Libraries such as pandas, matplotlib, seaborn, and numpy are imported, and the dataset is loaded for analysis.
-
Descriptive Statistics
- Summary statistics and basic information about the dataset are displayed, including mean, standard deviation, and other descriptive measures for numerical variables.
-
Missing Values
- Missing values are identified and visualized using a heatmap. Unnecessary columns are removed, and rows with missing values are dropped.
-
Distributions and Plots
- Various plots, including count plots and distribution plots, are created to explore the distribution of variables like age and BMI.
-
Variance and Standard Deviation
- Concepts of variance and standard deviation are explained using the 'Age' variable, and a scatter plot with a linear fit is provided.
-
Statistical Tests
- T-tests are performed to compare the mean age and BMI between different groups (PCOS vs. non-PCOS). The results are visualized through histograms.
-
More Visualizations
- Correlation heatmap and pair plots are generated to visualize relationships between selected variables. A box plot is used to show the distribution of BMI by PCOS status.
-
Odds Ratio
- The odds ratio is calculated to understand the association between PCOS and weight gain. A count plot is created to visualize the relationship.
-
Adjusted Odds Ratio
- Logistic regression is applied to calculate adjusted odds ratios for variables such as PCOS, regular exercise, and fast food consumption. Results are interpreted with confidence intervals.
-
Relative Risk
- A cross-tabulation table is created to calculate and visualize the relative risk between fast food consumption and weight gain.
This notebook provides a comprehensive analysis of the PCOS dataset, covering exploratory data analysis, statistical concepts, and visualizations. The findings contribute to a better understanding of the relationships between variables and their potential impact on PCOS and related conditions. Researchers and data analysts can leverage this analysis as a foundation for further investigations or interventions related to PCOS.