Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix but in lqas_get_class_prob(); fix #66 #67

Merged
merged 1 commit into from
Dec 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion R/04-test_classifier.R
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ lqas_get_class_prob <- function(x) {
## Create confusion matrix ----
x[[1]]$true <- cut(
x[[1]]$proportion * 100,
breaks = c(0, x$dLower, x$dUpper, 100),
breaks = c(0, x$dLower * 100, x$dUpper * 100, 100),
labels = c(1, 2, 3)
)

Expand Down
49 changes: 22 additions & 27 deletions R/data.R
Original file line number Diff line number Diff line change
@@ -1,47 +1,42 @@
################################################################################
#
#'
#' List of villages in Bo District, Sierra Leone
#'
#' @format A tibble with 1001 rows and 4 columns:
#' \describe{
#' \item{`id`}{Unique identifier}
#' \item{`chiefdom`}{Chiefdom}
#' \item{`section`}{Section}
#' \item{`village`}{Village}
#' }
#'
#' @format A tibble with 1001 rows and 4 columns
#'
#' **Variable** | **Description**
#' :--- | :---
#' *id* | Unique identifier
#' *chiefdom* | Chiefdom
#' *section* | Section
#' *village* | Village
#'
#' @source Ministry of Health, Sierra Leone
#'
#' @examples
#' village_list
#'
#
################################################################################
#'

"village_list"


################################################################################
#
#'
#' SLEAC survey data from Sierra Leone
#'
#' @format A tibble with 14 rows and 6 columns:
#' \describe{
#' \item{`country`}{Country}
#' \item{`province`}{Province}
#' \item{`district`}{District}
#' \item{`in_cases`}{Cases found who are in the programme}
#' \item{`out_cases`}{Cases found who are not in the programme}
#' \item{`n`}{Total number of under 5 children sampled}
#' }
#' @format A tibble with 14 rows and 6 columns
#'
#' **Variable** | **Description**
#' :--- | :---
#' *country* | Country
#' *province* | Province
#' *district* | District
#' *in_cases* | Cases found who are in the programme
#' *out_cases* | Cases found who are not in the programme
#' *n* | Total number of under 5 children sampled
#'
#' @source Ministry of Health, Sierra Leone
#'
#' @examples
#' survey_data
#'
#
################################################################################
"survey_data"

"survey_data"
46 changes: 46 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ knitr::opts_chunk$set(
fig.path = "man/figures/README-",
out.width = "100%"
)

library(sleacr)
```

# sleacr: Simplified Lot Quality Assurance Sampling Evaluation of Access and Coverage (SLEAC) Tools <img src="man/figures/logo.png" width="200px" align="right" />
Expand All @@ -35,6 +37,8 @@ The `{sleacr}` package provides functions that facilitate the design, sampling,

* Functions to draw a stage 1 sample for a SLEAC survey;

* Functions to classify coverage; and,

* Functions to determine the performance of chosen classifier cut-offs for analysis of SLEAC survey data.

## Installation
Expand Down Expand Up @@ -82,7 +86,49 @@ In this updated sampling plan, the decision rule is now more than 10 SAM cases b

### Stage 1 sample

The first stage sample of a SLEAC survey is a systematic spatial sample. Two methods can be used and both methods take the sample from all parts of the survey area: the *list-based* method and the *map-based* method. The `{sleacr}` package currently supports the implementation of the *list-based* method.

In the list-based method, communities to be sampled are selected systematically from a complete list of communities in the survey area. This list of communities should sorted by one or more non-overlapping spatial factors such as district and subdistricts within districts. The `village_list` dataset is an example of such a list.

```{r}
village_list
```

The `get_sampling_list()` function implements the list-based sampling method. For example, if 40 clusters/villages are needed to be sampled to find the 19 SAM cases calculated earlier, a sampling list can be created as follows:

```{r stage-1-sample, eval = FALSE}
get_sampling_list(village_list, 40)
```

which provides the following sampling list:

```{r stage-1-sample-show, echo = FALSE}
get_sampling_list(village_list, 40) |>
knitr::kable()
```

### Classifying coverage

With data collected from a SLEAC survey, the `lqas_classify_coverage()` function is used to classify coverage. For example, using the `survey_data` dataset, per district coverage classification can be calculated as follows:

```{r classify-coverage, eval = FALSE}
with(survey_data, lqas_classify_coverage(n = in_cases, n_total = n))
```

which outputs the following results:

```{r classify-coverage-show, echo = FALSE}
with(survey_data, lqas_classify_coverage(n = in_cases, n_total = n))
```

### Assessing classifier performance

It is useful to be able to assess the performance of the classifier chosen for a SLEAC survey. For example, in the context presented above of an area with a population of 600, a sample size of 40 and a 60% and 90% threshold classifier, the performance of this classifier can be assessed by first simulating a population and then determining the classification probabilities of the chosen classifier on this population.

```{r classifier-test}
lqas_simulate_test(pop = 600, n = 40, dLower = 0.6, dUpper = 0.9) |>
lqas_get_class_prob()
```

## Citation

Expand Down
168 changes: 162 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ following:

- Functions to draw a stage 1 sample for a SLEAC survey;

- Functions to classify coverage; and,

- Functions to determine the performance of chosen classifier cut-offs
for analysis of SLEAC survey data.

Expand All @@ -64,7 +66,7 @@ install.packages(

To setup an LQAS sampling frame, a target sample size is first
estimated. For example, if the survey area has an estimated population
of about 600 severe acute malnourished (SAME) children and you want to
of about 600 severe acute malnourished (SAM) children and you want to
assess whether coverage is reaching at least 50%, the sample size can be
calculated as follows:

Expand Down Expand Up @@ -95,6 +97,159 @@ alpha and beta errors no more than 10%. The alpha and beta errors
requirement is set at no more than 10% by default. This can be made more
precise by setting alpha and beta errors less than 10%.

There are contexts where survey data has already been collected and the
sample is less than what was aimed for based on the original sampling
frame. The `get_sample_d()` function is used to determine the error
levels of the achieved sample size. For example, if the survey described
above only achieved a sample size of 16, the `get_sample_d()` function
can be used as follows:

``` r
get_sample_d(N = 600, n = 16, dLower = 0.5, dUpper = 0.8)
```

which gives an alternative LQAS sampling plan based on the achieved
sample size.

#> $n
#> [1] 16
#>
#> $d
#> [1] 10
#>
#> $alpha
#> [1] 0.07890285
#>
#> $beta
#> [1] 0.1019738

In this updated sampling plan, the decision rule is now more than 10 SAM
cases but with higher alpha and beta errors. Note that the beta error is
now slightly higher than 10%.

### Stage 1 sample

The first stage sample of a SLEAC survey is a systematic spatial sample.
Two methods can be used and both methods take the sample from all parts
of the survey area: the *list-based* method and the *map-based* method.
The `{sleacr}` package currently supports the implementation of the
*list-based* method.

In the list-based method, communities to be sampled are selected
systematically from a complete list of communities in the survey area.
This list of communities should sorted by one or more non-overlapping
spatial factors such as district and subdistricts within districts. The
`village_list` dataset is an example of such a list.

``` r
village_list
#> # A tibble: 1,001 × 4
#> id chiefdom section village
#> <dbl> <chr> <chr> <chr>
#> 1 1 Badjia Damia Ngelehun
#> 2 2 Badjia Damia Gondama
#> 3 3 Badjia Damia Penjama
#> 4 4 Badjia Damia Jawe
#> 5 5 Badjia Damia Dambala
#> 6 6 Badjia Fallay Bumpewo
#> 7 7 Badjia Fallay Pelewahun
#> 8 8 Badjia Fallay Pendembu
#> 9 9 Badjia Kpallay Jokibu
#> 10 10 Badjia Kpallay Kpaku
#> # ℹ 991 more rows
```

The `get_sampling_list()` function implements the list-based sampling
method. For example, if 40 clusters/villages are needed to be sampled to
find the 19 SAM cases calculated earlier, a sampling list can be created
as follows:

``` r
get_sampling_list(village_list, 40)
```

which provides the following sampling list:

| id | chiefdom | section | village |
| --: | :------------ | :------------- | :---------- |
| 13 | Badjia | Kpallay | Kugbahun |
| 38 | Bagbe | Jongo | Bandajuma |
| 63 | Bagbe | Nyallay | Fuinda |
| 88 | Bagbo | Gorapon | Kassay |
| 113 | Bagbo | Kpangbalia | Kpangbalia |
| 138 | Bagbo | Tissawa | Monjemei |
| 163 | Baoma | Bambawo | Feiba |
| 188 | Baoma | Mawojeh | Masao |
| 213 | Baoma | Upper Pataloo | Komende |
| 238 | Bumpe Ngao | Bumpe | Nguabu |
| 263 | Bumpe Ngao | Bumpe | Sembehun |
| 288 | Bumpe Ngao | Sewama | Juhun |
| 313 | Bumpe Ngao | Sahn | Sembehun |
| 338 | Bumpe Ngao | Taninahun | Nyandehun |
| 363 | Bumpe Ngao | Taninahun | Waterloo |
| 388 | Bumpe Ngao | Taninahun | Kangama |
| 413 | Bumpe Ngao | Yengema | Yengema |
| 438 | Gbo | Maryu | Kama |
| 463 | Jaiama Bongor | Lower Kama | Bangema |
| 488 | Jaiama Bongor | Tongowa | Lalewahun |
| 513 | Jaiama Bongor | Upper Kama | Bowohun |
| 538 | Kakua | Kpandobu | Manguama |
| 563 | Kakua | Nguabu | Gandorhun |
| 588 | Kakua | Samamie | Gbanja Town |
| 613 | Komboya | Kemoh | Manyama |
| 638 | Komboya | Mangaru | Kpamajama |
| 663 | Lugbu | Yalenga | Kpetema |
| 688 | Niawa Lenga | Kaduawo | Huawuma |
| 713 | Niawa Lenga | Yalenga | Kpah |
| 738 | Selenga | Mambawa | Gbangaima |
| 763 | Selenga | Old Town | Korwama |
| 788 | Tikonko | Seiwa | Kapima |
| 813 | Tikonko | Njagbla II | Failor |
| 838 | Tikonko | Seiwa | Gbanahun |
| 863 | Valunia | Deilenga | Konima |
| 888 | Valunia | Kendebu | Kpetema |
| 913 | Valunia | Lunia | Levuma |
| 938 | Valunia | Lunia | Njala |
| 963 | Valunia | Seilenga | Foya |
| 988 | Wonde | Central Kargoi | YawaJu |

### Classifying coverage

With data collected from a SLEAC survey, the `lqas_classify_coverage()`
function is used to classify coverage. For example, using the
`survey_data` dataset, per district coverage classification can be
calculated as follows:

``` r
with(survey_data, lqas_classify_coverage(n = in_cases, n_total = n))
```

which outputs the following results:

#> [1] "Low" "Low" "Low" "Low" "Low"
#> [6] "Low" "Low" "Moderate" "Moderate" "Moderate"
#> [11] "Low" "Low" "Low" "Low"

### Assessing classifier performance

It is useful to be able to assess the performance of the classifier
chosen for a SLEAC survey. For example, in the context presented above
of an area with a population of 600, a sample size of 40 and a 60% and
90% threshold classifier, the performance of this classifier can be
assessed by first simulating a population and then determining the
classification probabilities of the chosen classifier on this
population.

``` r
lqas_simulate_test(pop = 600, n = 40, dLower = 0.6, dUpper = 0.9) |>
lqas_get_class_prob()
#> Low : 0.9562
#> Moderate : 0.8288
#> High : 0.8393
#> Overall : 0.9063
#> Gross misclassification : 0
```

## Citation

If you use `{sleacr}` in your work, please cite using the suggested
Expand All @@ -105,11 +260,12 @@ citation("sleacr")
#> To cite sleacr in publications use:
#>
#> Mark Myatt, Ernest Guevarra, Lionella Fieschi, Allison
#> Norris, Saul Guerrero, Lilly Schofield, Daniel Jones, Ephrem
#> Emru, Kate Sadler (2012). _Semi-Quantitative Evaluation of
#> Access and Coverage (SQUEAC)/Simplified Lot Quality
#> Assurance Sampling Evaluation of Access and Coverage (SLEAC)
#> Technical Reference_. FHI 360/FANTA, Washington, DC.
#> Norris, Saul Guerrero, Lilly Schofield, Daniel Jones,
#> Ephrem Emru, Kate Sadler (2012). _Semi-Quantitative
#> Evaluation of Access and Coverage (SQUEAC)/Simplified Lot
#> Quality Assurance Sampling Evaluation of Access and
#> Coverage (SLEAC) Technical Reference_. FHI 360/FANTA,
#> Washington, DC.
#> <https://www.fantaproject.org/sites/default/files/resources/SQUEAC-SLEAC-Technical-Reference-Oct2012_0.pdf>.
#>
#> A BibTeX entry for LaTeX users is
Expand Down
Loading
Loading