- New function that creates a csv from a list of fields and constraints, or from a TableSchema #101
- Enable outputing loaded dataframe #102
- Better naming, hint types and minor refactors #103
- The returned dataframe has its columns properly cast to the detected types #104
- Enable calling main functions from base #97
- Better detection of ints and floats #94
- Better handle NaN values #96
- Reshape exemple.py, clean up code and improve changelog #98
- Refactor tests import, now using folder arborescence instead of pre-made file #93
- Fix inversion (count<=>value) in profile #95
- Outsource many formats to fr-format library #87
- Better date detection #89
- Update dependencies to make tests pass #81
- Update readme #81
- Hint type #81
- Minor refactors #81
- Fixes after production release in hydra #80
- Handle other file formats: xls, xlsx, ods (and more) and analysis through URLs #73
- Handle files with no extension (cc hydra) #79
- prevent exporting NaN values in profile #72
- raise ValueError if analyzed file has various number of columns across first rows #72
- Add logs for columns that would take too much time within a specific test #70
- Refactor some tests to improve performances and make detection more accurate #69
- Try alternative ways to clean text #71
- Change setup.py to better convey dependencies #67
- Change encoding detection for faust-cchardet (forked from cchardet) #66
- Better handling of ints and floats (now not accepting blanks and "+" in string) #62
- Faster routine #59
- Catch OverflowError for latitude and longitude checks #58
- Add CI and upgrade dependencies #49
- Shuffle data before analysis #56
- Better discrimination between
code_departement
andcode_region
#56 - Add schema in output analysis #57
0.4.7 #51
- Allow possibility to analyze entire file instead of a limited number of rows #48
- Better boolean detection #42
- Differentiate python types and format for
date
anddatetime
#43 - Better
code_departement
andcode_commune_insee
detection #44 - Fix header line (
header_row_idx
) detection #44 - Allow possibility to get profile of csv #46
0.4.6 #39
- Fix tests
- Prioritise lat / lon FR detection over more generic lat / lon.
- To reduce false positives, prevent detection of the following if label detection is missing:
['code_departement', 'code_commune_insee', 'code_postal', 'latitude_wgs', 'longitude_wgs', 'latitude_wgs_fr_metropole', 'longitude_wgs_fr_metropole', 'latitude_l93', 'longitude_l93']
- Lower threshold of label detection so that if one relevant is detected in the label, it boosts the detection score.
- Add ISO country alpha-3 and numeric detection
- include camel case parsing in _process_text function
- Support optional brackets in latlon format
0.4.5 #29
- Use
netloc
instead ofurl
in location dict
0.4.4 [#24] (#28)
- Prevent crash on empty CSVs
- Add optional arguments encoding and sep to routine and routine_minio functions
- Field detection improvements (code_csp_insee and datetime RFC 822)
- Schema generation improvements with examples
0.4.3 [#24] (#24)
- Add uuid and MongoID detection
- Add new function dedicated to interaction with minio data
- Add table schema automatic generation (only on minio data)
- Modification of calculated score (consider label detection as a boost for score)
0.4.2 [#22] (#22)
Add type detection by header name
0.4.1 [#19] (#19)
Fix bug
- num_rows was causing problem when it was fix to other value than default - Fixed
0.4.0 [#18] (#18)
Add detailed output possibility
Details :
- two modes now for output report : "LIMITED" and "ALL"
- "ALL" option give user information on found proportion for each column types and each columns
0.3.0 [#15] (#15)
Fix bugs
Details :
- Facilitate ML Integration
- Add column types detection
- Fix documentation
0.2.1 - #2
Add continuous integration
Details :
- Add configuration for CircleCI
- Add
CONTRIBUTING.md
- Push automatically new versions to PyPI
- Use semantic versioning
0.2 - #1
Port from python2 to python3
Details :
- Add license AGPLv3
- Update requirements