forked from cregouby/x2y
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
64 lines (45 loc) · 1.53 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# x2y
<!-- badges: start -->
<!-- badges: end -->
The goal of {x2y} is to provide column-to-column mutual information gain for large dataframe, so that you can
remove noise or redundant column based on their information gain for a machine learning modeling.
This is a generalization of the fantastic [RViews blog post from Rama Ramakrishnan](https://rviews.rstudio.com/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/)
## Installation
You can install the development version of x2y from [GitHub](https://github.com/) with:
``` r
# install.packages("pak")
pak::pak("cregouby/x2y")
```
## Example
This is a basic example which shows you how to solve a common problem:
```{r example}
library(x2y)
library(ggplot2)
## basic example code
set.seed(42)
x <- seq(-1,1,0.01)
circular_df <- data.frame(x = x,
y = sqrt(1 - x^2) + rnorm(length(x),mean = 0, sd = 0.05),
z = rnorm(length(x))
)
ggplot(circular_df, aes(x = x, y = y)) +
geom_point()
```
Here is the mutual information gain provided by {x2y} for this example
```{r circular df}
dx2y(circular_df)
```
## Related work
- [{lares} package](https://laresbernardo.github.io/lares/) is having the `x2y()` function from the same inspiration.