CycloneDX · prabhu · Feb 9, 2025 · Feb 9, 2025 · Feb 9, 2025 · Feb 9, 2025
@@ -14,6 +14,10 @@ you are an expert in converting markdown files to plain text jsonlines format ba
 you are an expert in converting json schema files to a single plain text jsonlines format based on the my template. {"messages": [{"role": "user", "content": "<user_question>"}, {"role": "assistant", "content": "<detailed_explanation>"}]}. Understand the contents of the json schema file by reading attributes such as title, description, examples. Generate all possible questions a user might ask about a given property and a long explanation to answer the question. Substitute "user_question" with your generated question and "detailed_explanation" with your generated explanation. Escape double quotes with a backslash \ to make each line a valid json. Generate a plain text response of json line by line without any commas or list. I will start uploading the json files in the rest of the chat session. generate a single plain text response without any markdown formatting for the entire response.
 ```
 
+```text
+you are an expert in converting jsdoc comments to a single plain text jsonlines format based on the my template. {"messages": [{"role": "user", "content": "<user_question>"}, {"role": "assistant", "content": "<detailed_explanation>"}]}. Understand the contents of the jsdoc comments like a javascript developer. Generate at least 200 possible questions a junior engineer might ask about a given function and a long explanation to answer the question based entirely on the jsdoc comment. Substitute "user_question" with your generated question and "detailed_explanation" with your generated explanation. Escape double quotes with a backslash \ to make each line a valid json. Generate a plain text response of json line by line without any commas or list. I will start uploading the js files with cmments in the rest of the chat session. generate a single plain text response without any markdown formatting for the entire response. Do not show your thinking. Do not include any answers guessed.
+```
+
 The data was then validated and reviewed manually for accuracy.
 
 ## Fine-tuning

@@ -11,7 +11,7 @@ ADAPTERS_PATH=adapters
 DATASET_PATH=dataset
 
 FUSED_MODEL=${HF_ORG}/${TOOL_BASE_MODEL}-${TUNING_TOOL}
-FUSED_GGUF_MODEL=${HF_ORG}/${TOOL_BASE_MODEL}-gguf
+# FUSED_GGUF_MODEL=${HF_ORG}/${TOOL_BASE_MODEL}-gguf
 QUANT_MODEL_8BIT=${HF_ORG}/${TOOL_BASE_MODEL}-${TUNING_TOOL}-8bit
 QUANT_MODEL_6BIT=${HF_ORG}/${TOOL_BASE_MODEL}-${TUNING_TOOL}-6bit
 QUANT_MODEL_4BIT=${HF_ORG}/${TOOL_BASE_MODEL}-${TUNING_TOOL}-4bit
@@ -42,7 +42,7 @@ echo "Low-Rank Adaptation (LoRA) fine-tuning ${BASE_MODEL_MLX} with cdx1 dataset
 mlx_lm.lora --model ${BASE_MODEL_MLX} --train --data dataset --adapter-path ${ADAPTERS_PATH} --fine-tune-type lora --batch-size 1 --num-layers ${NUM_LAYERS} --iters 1000 --grad-checkpoint
 
 echo "Fuse model to ${FUSED_MODEL} using the cdx1 adapters"
-rm -rf ${FUSED_MODEL} ${FUSED_GGUF_MODEL}
+rm -rf ${FUSED_MODEL}
 # gguf export via mlx isn't working
 # mlx_lm.fuse --model ${BASE_MODEL_MLX} --adapter-path adapters --hf-path ${FUSED_MODEL} --save-path ${FUSED_MODEL} --de-quantize --export-gguf --gguf-path cdx1-f16.gguf
 mlx_lm.fuse --model ${BASE_MODEL_MLX} --adapter-path adapters --hf-path ${FUSED_MODEL} --save-path ${FUSED_MODEL} --de-quantize

@@ -19,8 +19,12 @@ const validData = [];
 
 for (const jf of jsonlFiles) {
   const lines = readFileSync(jf, "utf-8");
+  // Ignore empty lines
+  if (!lines.trim().length) {
+    continue;
+  }
   trainData.push(lines);
-  if (jf.includes("readme") || jf.includes("cdxgen") || jf.includes("cli")) {
+  if (jf.includes("readme") || jf.includes("cli") || jf.includes("semantics")) {
     validData.push(lines);
   }
 }

@@ -280,3 +280,48 @@ Peak memory: 23.951 GB
 ```
 
 For the `--num-draft-tokens` argument, use a value between 2 - 4.
+
+## Short-and-quick factual answer
+
+cdx1 is trained to provide a short, quick answer to a given prompt. To receive a detailed response, the user's prompt must explicitly request it.
+
+Example: `Does cdxgen offer container images for building sbom for dotnet?`
+
+```text
+mlx_lm.generate --model ./CycloneDX/cdx1-mlx-8bit --prompt "Does cdxgen offer container images for building sbom for dotnet?" --temp 0.05 --system-prompt "You are a helpful assistant."
+==========
+Yes, cdxgen provides container images for building SBOMs for .NET applications.
+==========
+```
+
+Include `Can you provide a detailed example?` to increase number of generated tokens.
+
+```text
+mlx_lm.generate --model ./CycloneDX/cdx1-mlx-8bit --prompt "Does cdxgen offer container images for building sbom for dotnet? Can you provide a detailed example?" --temp 0.05 --system-prompt "You are a helpful assistant." --max-tokens 128000
+==========
+Yes, cdxgen offers container images for building SBOMs for .NET. Here's a detailed example:
+
+1. **Pull the .NET SBOM container image:**
+   ```bash
+   docker pull ghcr.io/cyclonedx/cdxgen-dotnet:latest
+   ```
+
+2. **Run the container with your .NET project:**
+   ```bash
+   docker run --rm -v $(pwd):/app -w /app ghcr.io/cyclonedx/cdxgen-dotnet:latest cdxgen -t dotnet -o sbom.json
+   ```
+
+This command mounts your current directory (which should contain your .NET project) into the container and runs `cdxgen` to generate an SBOM for the .NET project.
+==========
+```
+
+It is not possible to modify this behaviour using system prompts alone.
+
+Example:
+
+```text
+mlx_lm.generate --model ./CycloneDX/cdx1-mlx-8bit --prompt "Does cdxgen offer container images for building sbom for dotnet?" --temp 0.05 --system-prompt "You are a helpful assistant. Always provide a detailed response with examples and reference to the source." --max-tokens 128000
+==========
+Yes, cdxgen provides container images for building SBOMs for .NET applications.
+==========
+```