Skip to content

Commit

Permalink
feat: reported DCR_share with the description when holdout provided
Browse files Browse the repository at this point in the history
  • Loading branch information
ivonaVlckova committed Feb 7, 2025
1 parent e0312e3 commit 0624589
Showing 1 changed file with 47 additions and 1 deletion.
48 changes: 47 additions & 1 deletion mostlyai/qa/assets/html/report_template.html
Original file line number Diff line number Diff line change
Expand Up @@ -152,9 +152,15 @@ <h1 id="summary"><span>{{ meta.report_title }}</span>{{ meta.report_subtitle }}<
<td style="width: 70px;">
<div class="result-box-title">
Distances
{% if metrics.distances.dcr_share is not none %}
<div data-bs-toggle="tooltip" data-bs-title='Distances represent the proximity between synthetic samples and their nearest training samples, with an identical match having a distance of zero. For comparison, average distances to holdout samples are shown in light gray, helping assess if the model has learned general patterns common in both training and holdout sets. The DCR share indicates the proportion of synthetic samples that are closer to a training sample than to a holdout sample, and ideally, this value should not significantly exceed 50%, as a higher value could indicate overfitting.'>
{{html_assets['info.svg']}}
</div>
{% else %}
<div data-bs-toggle="tooltip" data-bs-title='This metric represents the average distance between synthetic samples and their nearest training samples. For comparison, the average distances between synthetic samples and samples from a holdout dataset is shown in light gray to assess if the trained model learned the general patterns that are common in training as well as in holdout sets.'>
{{html_assets['info.svg']}}
</div>
{% endif %}
</div>
</td>
<td>
Expand All @@ -180,6 +186,16 @@ <h1 id="summary"><span>{{ meta.report_title }}</span>{{ meta.report_subtitle }}<
{% endif %}
</td>
</tr>
{% if metrics.distances.dcr_share is not none %}
<tr>
<td>DCR share</td>
<td align="left">
{% if metrics.distances.dcr_holdout is not none %}
{{ "{:.3f}".format(metrics.distances.dcr_share) }}
{% endif %}
</td>
</tr>
{% endif %}
</table>
</td>
</tr>
Expand Down Expand Up @@ -388,25 +404,55 @@ <h2 id="distances" class="anchor">Distances</h2>
</tr>
</tbody>
</table>
<br />
<div class="white-box p-3">
{{ distances_dcr_html_chart }}
</div>
<br />
{% if metrics.distances.dcr_share is not none %}
<div class="table-responsive col-md-8 offset-md-2">
<table class='table' style="text-align: left">
<thead>
<tr>
<td style="width: 33%"> </td>
<td style="width: 33%">Observed</td>
<td style="width: 33%"><small class="muted-text">(Optimum)</small></td>
</tr>
</thead>
<tbody>
<tr>
<td>DCR Share</td>
<td>{{ "{:.3f}".format(metrics.distances.dcr_share) }}</td>
<td><small class="muted-text">({{ "{:.3f}".format(0.5) }})</small></td>
</tr>
</tbody>
</table>
</div>
{% endif %}
</div>
<br />
<div class="explainer" style="margin-bottom: 30px">
<div class="explainer-header">
<div class="explainer-icon">{{html_assets['explainer.svg']}}</div>
<div class="explainer-title">Explainer</div>
</div>
{% if metrics.distances.dcr_share is not none %}
<div class="explainer-body">
Synthetic data shall be as close to the original training samples, as it is close to original holdout samples, which serve us as a reference.
This can be asserted empirically by measuring distances between synthetic samples to their closest original samples, whereas training and holdout sets are sampled to be of equal size.
DCR Share is the share of synthetic samples that are closer to a training sample than to a holdout sample. This shall not be significantly larger than 50%. <br />
For the visualization above, the distances of synthetic samples to the training samples are displayed in green, and the distances of synthetic samples to the holdout samples (if available) displayed in gray.
A green line that is significantly left of the gray line implies that synthetic samples are closer to the training samples than to the holdout samples, indicating that the data has overfitted to the training data.
A green line that overlays with the gray line validates that the trained model indeed represents the general rules, that can be found in training just as well as in holdout samples.
</div>
{% else %}
<div class="explainer-body">
Synthetic data shall be as close to the original training samples, as it is close to original holdout samples, which serve us as a reference.
This can be asserted empirically by measuring distances between synthetic samples to their closest original samples, whereas training and holdout sets are sampled to be of equal size.
For the visualization above, the distances of synthetic samples to the training samples are displayed in green, and the distances of synthetic samples to the holdout samples (if available) displayed in gray.
A green line that is significantly left of the gray line implies that synthetic samples are closer to the training samples than to the holdout samples, indicating that the data has overfitted to the training data.
A green line that overlays with the gray line validates that the trained model indeed represents the general rules, that can be found in training just as well as in holdout samples.
</div>
{% endif %}
</div>
</div>
{% endif %}
Expand Down

0 comments on commit 0624589

Please sign in to comment.