v0.4.3
Runtime Changes
Migrating from v0.4.2 to v0.4.3 should result in a 30-90% reduction in profiling time.
Largely dependent on system resources and data size.
Notes
- Remove requirement for tensorflow-addons
- Library now works with tensorflow nightly (Python 3.9)
- Added example on generating a new data labeler
Profiler
- Multiprocessing data preprocessing
- Improved histogram accuracy
- Reduced histogram generation runtime
- Option to set the bin count for histogram
- Expanded precision and switch to precision estimation (as opposed to exact calculations)
- Limit pool size based on cpu and memory limitations
Data
- Improved JSON detection method
- Option (default) pulls metadata and data separately (
data.meta
anddata.data
) - data.meta would be part of the JSON which contains no records
- data.data would be part of the JSON which contains records
- Added option to select keys which represent records
- Option (default) pulls metadata and data separately (
Report
- Precision report now contains additional details
"precision": {
'min': int,
'max': int,
'mean': float,
'var': float,
'std': float,
'sample_size': int,
'margin_of_error': float,
'confidence_level': float
},
Bug fixes
- Fixed error in merging options
- Fixed issue related to merging DateTimeColumns
- Fixed multiprocessing on OSX
- Fixed row calculations if
min_true_samples
is greater than zero