try to add r lib path

szcf-weiya · Jan 28, 2025 · 756b19d · 756b19d
1 parent 31fa8fc
commit 756b19d
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 1 deletion.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -45,5 +45,6 @@ jobs:
 
       - uses: julia-actions/julia-docdeploy@v1
         env:
+          LD_LIBRARY_PATH: /opt/R/${{ matrix.r-version }}/lib/R/lib
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # If authenticating with GitHub Actions token
           DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} # If authenticating with SSH deploy key
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,3 +1,7 @@
 # Kmeans Benchmarks
 
-Kmeans is a popular clustering algorithm.
+Clustering is a cornerstone of unsupervised machine learning, with the k-means algorithm standing as one of the most widely used methods for partitioning data into coherent groups. Its simplicity, interpretability, and adaptability have made it a staple in fields ranging from customer segmentation to bioinformatics. However, the performance and results of k-means can vary significantly depending on the implementation choices made by practitioners, including the software ecosystem (e.g., R, Julia, Python), the algorithmic variants employed (e.g., Lloyd’s algorithm, Hartigan-Wong, or scalable approximations like Mini-Batch k-means), and the initialization strategies (e.g., random seeding, k-means++, or density-based initialization). These choices impact not only computational efficiency but also the quality and stability of the resulting clusters.
+
+This project seeks to systematically benchmark and compare k-means implementations across different frameworks—focusing on R and Julia as representative languages for statistical computing and high-performance numerical analysis, respectively—while also evaluating the interplay between initialization methods and algorithmic variants. R, with its rich ecosystem of packages (e.g., `stats`, `ClusterR`), offers user-friendly tools optimized for statistical rigor, whereas Julia (particularly the package `Clustering`), leveraging its just-in-time (JIT) compilation and parallel computing capabilities, promises faster execution for large datasets. Beyond software comparisons, the study will assess how initialization techniques (e.g., naive random centroids vs. sophisticated seeding) influence convergence rates, cluster quality metrics (e.g., silhouette score, within-cluster sum of squares), and sensitivity to local optima.
+
+By quantifying trade-offs between computational speed, scalability, and cluster accuracy, this work aims to provide actionable insights for researchers and practitioners in selecting optimal k-means configurations tailored to their data size, dimensionality, and domain requirements. The findings will contribute to a deeper understanding of how algorithmic choices and software ecosystems shape the practical utility of this foundational clustering method.