nutriverse · ernestguevarra · Dec 26, 2024 · Dec 26, 2024
diff --git a/R/04-test_classifier.R b/R/04-test_classifier.R
@@ -248,7 +248,7 @@ lqas_get_class_prob <- function(x) {
   ## Create confusion matrix ----
   x[[1]]$true <- cut(
     x[[1]]$proportion * 100,
-    breaks = c(0, x$dLower, x$dUpper, 100),
+    breaks = c(0, x$dLower * 100, x$dUpper * 100, 100),
     labels = c(1, 2, 3)
   )
 

diff --git a/R/data.R b/R/data.R
@@ -1,47 +1,42 @@
-################################################################################
-#
 #'
 #' List of villages in Bo District, Sierra Leone
 #'
-#' @format A tibble with 1001 rows and 4 columns:
-#' \describe{
-#'   \item{`id`}{Unique identifier}
-#'   \item{`chiefdom`}{Chiefdom}
-#'   \item{`section`}{Section}
-#'   \item{`village`}{Village}
-#' }
-#'
+#' @format A tibble with 1001 rows and 4 columns
+#' 
+#' **Variable** | **Description**
+#' :--- | :---
+#' *id* | Unique identifier
+#' *chiefdom* | Chiefdom
+#' *section* | Section
+#' *village* | Village
+#' 
 #' @source Ministry of Health, Sierra Leone
 #'
 #' @examples
 #' village_list
-#'
-#
-################################################################################
+#' 
+
 "village_list"
 
 
-################################################################################
-#
 #'
 #' SLEAC survey data from Sierra Leone
 #'
-#' @format A tibble with 14 rows and 6 columns:
-#' \describe{
-#'   \item{`country`}{Country}
-#'   \item{`province`}{Province}
-#'   \item{`district`}{District}
-#'   \item{`in_cases`}{Cases found who are in the programme}
-#'   \item{`out_cases`}{Cases found who are not in the programme}
-#'   \item{`n`}{Total number of under 5 children sampled}
-#' }
+#' @format A tibble with 14 rows and 6 columns
+#' 
+#' **Variable** | **Description**
+#' :--- | :---
+#' *country* | Country
+#' *province* | Province
+#' *district* | District
+#' *in_cases* | Cases found who are in the programme
+#' *out_cases* | Cases found who are not in the programme
+#' *n* | Total number of under 5 children sampled
 #'
 #' @source Ministry of Health, Sierra Leone
 #'
 #' @examples
 #' survey_data
 #'
-#
-################################################################################
-"survey_data"
 
+"survey_data"
diff --git a/README.Rmd b/README.Rmd
@@ -11,6 +11,8 @@ knitr::opts_chunk$set(
   fig.path = "man/figures/README-",
   out.width = "100%"
 )
+
+library(sleacr)
 ```
 
 # sleacr: Simplified Lot Quality Assurance Sampling Evaluation of Access and Coverage (SLEAC) Tools <img src="man/figures/logo.png" width="200px" align="right" />
@@ -35,6 +37,8 @@ The `{sleacr}` package provides functions that facilitate the design, sampling,
 
 * Functions to draw a stage 1 sample for a SLEAC survey;
 
+* Functions to classify coverage; and,
+
 * Functions to determine the performance of chosen classifier cut-offs for analysis of SLEAC survey data.
 
 ## Installation
@@ -82,7 +86,49 @@ In this updated sampling plan, the decision rule is now more than 10 SAM cases b
 
 ### Stage 1 sample
 
+The first stage sample of a SLEAC survey is a systematic spatial sample. Two methods can be used and both methods take the sample from all parts of the survey area: the *list-based* method and the *map-based* method. The `{sleacr}` package currently supports the implementation of the *list-based* method.
+
+In the list-based method, communities to be sampled are selected systematically from a complete list of communities in the survey area. This list of communities should sorted by one or more non-overlapping spatial factors such as district and subdistricts within districts. The `village_list` dataset is an example of such a list.
+
+```{r}
+village_list
+```
+
+The `get_sampling_list()` function implements the list-based sampling method. For example, if 40 clusters/villages are needed to be sampled to find the 19 SAM cases calculated earlier, a sampling list can be created as follows:
+
+```{r stage-1-sample, eval = FALSE}
+get_sampling_list(village_list, 40)
+```
+
+which provides the following sampling list:
+
+```{r stage-1-sample-show, echo = FALSE}
+get_sampling_list(village_list, 40) |>
+  knitr::kable()
+```
 
+### Classifying coverage
+
+With data collected from a SLEAC survey, the `lqas_classify_coverage()` function is used to classify coverage. For example, using the `survey_data` dataset, per district coverage classification can be calculated as follows:
+
+```{r classify-coverage, eval = FALSE}
+with(survey_data, lqas_classify_coverage(n = in_cases, n_total = n))
+```
+
+which outputs the following results:
+
+```{r classify-coverage-show, echo = FALSE}
+with(survey_data, lqas_classify_coverage(n = in_cases, n_total = n))
+```
+
+### Assessing classifier performance
+
+It is useful to be able to assess the performance of the classifier chosen for a SLEAC survey. For example, in the context presented above of an area with a population of 600, a sample size of 40 and a 60% and 90% threshold classifier, the performance of this classifier can be assessed by first simulating a population and then determining the classification probabilities of the chosen classifier on this population.
+
+```{r classifier-test}
+lqas_simulate_test(pop = 600, n = 40, dLower = 0.6, dUpper = 0.9) |>
+  lqas_get_class_prob()
+```
 
 ## Citation
 

diff --git a/README.md b/README.md
@@ -42,6 +42,8 @@ following:
 
   - Functions to draw a stage 1 sample for a SLEAC survey;
 
+  - Functions to classify coverage; and,
+
   - Functions to determine the performance of chosen classifier cut-offs
     for analysis of SLEAC survey data.
 
@@ -64,7 +66,7 @@ install.packages(
 
 To setup an LQAS sampling frame, a target sample size is first
 estimated. For example, if the survey area has an estimated population
-of about 600 severe acute malnourished (SAME) children and you want to
+of about 600 severe acute malnourished (SAM) children and you want to
 assess whether coverage is reaching at least 50%, the sample size can be
 calculated as follows:
 
@@ -95,6 +97,159 @@ alpha and beta errors no more than 10%. The alpha and beta errors
 requirement is set at no more than 10% by default. This can be made more
 precise by setting alpha and beta errors less than 10%.
 
+There are contexts where survey data has already been collected and the
+sample is less than what was aimed for based on the original sampling
+frame. The `get_sample_d()` function is used to determine the error
+levels of the achieved sample size. For example, if the survey described
+above only achieved a sample size of 16, the `get_sample_d()` function
+can be used as follows:
+
+``` r
+get_sample_d(N = 600, n = 16, dLower = 0.5, dUpper = 0.8)
+```
+
+which gives an alternative LQAS sampling plan based on the achieved
+sample size.
+
+    #> $n
+    #> [1] 16
+    #> 
+    #> $d
+    #> [1] 10
+    #> 
+    #> $alpha
+    #> [1] 0.07890285
+    #> 
+    #> $beta
+    #> [1] 0.1019738
+
+In this updated sampling plan, the decision rule is now more than 10 SAM
+cases but with higher alpha and beta errors. Note that the beta error is
+now slightly higher than 10%.
+
+### Stage 1 sample
+
+The first stage sample of a SLEAC survey is a systematic spatial sample.
+Two methods can be used and both methods take the sample from all parts
+of the survey area: the *list-based* method and the *map-based* method.
+The `{sleacr}` package currently supports the implementation of the
+*list-based* method.
+
+In the list-based method, communities to be sampled are selected
+systematically from a complete list of communities in the survey area.
+This list of communities should sorted by one or more non-overlapping
+spatial factors such as district and subdistricts within districts. The
+`village_list` dataset is an example of such a list.
+
+``` r
+village_list
+#> # A tibble: 1,001 × 4
+#>       id chiefdom section village  
+#>    <dbl> <chr>    <chr>   <chr>    
+#>  1     1 Badjia   Damia   Ngelehun 
+#>  2     2 Badjia   Damia   Gondama  
+#>  3     3 Badjia   Damia   Penjama  
+#>  4     4 Badjia   Damia   Jawe     
+#>  5     5 Badjia   Damia   Dambala  
+#>  6     6 Badjia   Fallay  Bumpewo  
+#>  7     7 Badjia   Fallay  Pelewahun
+#>  8     8 Badjia   Fallay  Pendembu 
+#>  9     9 Badjia   Kpallay Jokibu   
+#> 10    10 Badjia   Kpallay Kpaku    
+#> # ℹ 991 more rows
+```
+
+The `get_sampling_list()` function implements the list-based sampling
+method. For example, if 40 clusters/villages are needed to be sampled to
+find the 19 SAM cases calculated earlier, a sampling list can be created
+as follows:
+
+``` r
+get_sampling_list(village_list, 40)
+```
+
+which provides the following sampling list:
+
+|  id | chiefdom      | section        | village     |
+| --: | :------------ | :------------- | :---------- |
+|  13 | Badjia        | Kpallay        | Kugbahun    |
+|  38 | Bagbe         | Jongo          | Bandajuma   |
+|  63 | Bagbe         | Nyallay        | Fuinda      |
+|  88 | Bagbo         | Gorapon        | Kassay      |
+| 113 | Bagbo         | Kpangbalia     | Kpangbalia  |
+| 138 | Bagbo         | Tissawa        | Monjemei    |
+| 163 | Baoma         | Bambawo        | Feiba       |
+| 188 | Baoma         | Mawojeh        | Masao       |
+| 213 | Baoma         | Upper Pataloo  | Komende     |
+| 238 | Bumpe Ngao    | Bumpe          | Nguabu      |
+| 263 | Bumpe Ngao    | Bumpe          | Sembehun    |
+| 288 | Bumpe Ngao    | Sewama         | Juhun       |
+| 313 | Bumpe Ngao    | Sahn           | Sembehun    |
+| 338 | Bumpe Ngao    | Taninahun      | Nyandehun   |
+| 363 | Bumpe Ngao    | Taninahun      | Waterloo    |
+| 388 | Bumpe Ngao    | Taninahun      | Kangama     |
+| 413 | Bumpe Ngao    | Yengema        | Yengema     |
+| 438 | Gbo           | Maryu          | Kama        |
+| 463 | Jaiama Bongor | Lower Kama     | Bangema     |
+| 488 | Jaiama Bongor | Tongowa        | Lalewahun   |
+| 513 | Jaiama Bongor | Upper Kama     | Bowohun     |
+| 538 | Kakua         | Kpandobu       | Manguama    |
+| 563 | Kakua         | Nguabu         | Gandorhun   |
+| 588 | Kakua         | Samamie        | Gbanja Town |
+| 613 | Komboya       | Kemoh          | Manyama     |
+| 638 | Komboya       | Mangaru        | Kpamajama   |
+| 663 | Lugbu         | Yalenga        | Kpetema     |
+| 688 | Niawa Lenga   | Kaduawo        | Huawuma     |
+| 713 | Niawa Lenga   | Yalenga        | Kpah        |
+| 738 | Selenga       | Mambawa        | Gbangaima   |
+| 763 | Selenga       | Old Town       | Korwama     |
+| 788 | Tikonko       | Seiwa          | Kapima      |
+| 813 | Tikonko       | Njagbla II     | Failor      |
+| 838 | Tikonko       | Seiwa          | Gbanahun    |
+| 863 | Valunia       | Deilenga       | Konima      |
+| 888 | Valunia       | Kendebu        | Kpetema     |
+| 913 | Valunia       | Lunia          | Levuma      |
+| 938 | Valunia       | Lunia          | Njala       |
+| 963 | Valunia       | Seilenga       | Foya        |
+| 988 | Wonde         | Central Kargoi | YawaJu      |
+
+### Classifying coverage
+
+With data collected from a SLEAC survey, the `lqas_classify_coverage()`
+function is used to classify coverage. For example, using the
+`survey_data` dataset, per district coverage classification can be
+calculated as follows:
+
+``` r
+with(survey_data, lqas_classify_coverage(n = in_cases, n_total = n))
+```
+
+which outputs the following results:
+
+    #>  [1] "Low"      "Low"      "Low"      "Low"      "Low"     
+    #>  [6] "Low"      "Low"      "Moderate" "Moderate" "Moderate"
+    #> [11] "Low"      "Low"      "Low"      "Low"
+
+### Assessing classifier performance
+
+It is useful to be able to assess the performance of the classifier
+chosen for a SLEAC survey. For example, in the context presented above
+of an area with a population of 600, a sample size of 40 and a 60% and
+90% threshold classifier, the performance of this classifier can be
+assessed by first simulating a population and then determining the
+classification probabilities of the chosen classifier on this
+population.
+
+``` r
+lqas_simulate_test(pop = 600, n = 40, dLower = 0.6, dUpper = 0.9) |>
+  lqas_get_class_prob()
+#>                     Low : 0.9562
+#>                Moderate : 0.8288
+#>                    High : 0.8393
+#>                 Overall : 0.9063
+#> Gross misclassification : 0
+```
+
 ## Citation
 
 If you use `{sleacr}` in your work, please cite using the suggested
@@ -105,11 +260,12 @@ citation("sleacr")
 #> To cite sleacr in publications use:
 #> 
 #>   Mark Myatt, Ernest Guevarra, Lionella Fieschi, Allison
-#>   Norris, Saul Guerrero, Lilly Schofield, Daniel Jones, Ephrem
-#>   Emru, Kate Sadler (2012). _Semi-Quantitative Evaluation of
-#>   Access and Coverage (SQUEAC)/Simplified Lot Quality
-#>   Assurance Sampling Evaluation of Access and Coverage (SLEAC)
-#>   Technical Reference_. FHI 360/FANTA, Washington, DC.
+#>   Norris, Saul Guerrero, Lilly Schofield, Daniel Jones,
+#>   Ephrem Emru, Kate Sadler (2012). _Semi-Quantitative
+#>   Evaluation of Access and Coverage (SQUEAC)/Simplified Lot
+#>   Quality Assurance Sampling Evaluation of Access and
+#>   Coverage (SLEAC) Technical Reference_. FHI 360/FANTA,
+#>   Washington, DC.
 #>   <https://www.fantaproject.org/sites/default/files/resources/SQUEAC-SLEAC-Technical-Reference-Oct2012_0.pdf>.
 #> 
 #> A BibTeX entry for LaTeX users is