Releases: exorde-labs/exorde-client
Minor fixes - Multilang keyword selection - Web interface (debug) - Sentiment v2
Multilang keyword selection
Web interface (debug)
Stability fixes
Performance improvement
Sentiment upgraded
2.5.1 - custom batch size, PUSH notification system, regular updates about REP & collected items over last 24h
2.5.0 - custom batch size, PUSH notification system, regular updates about REP & collected items over last 24h
Release 2.5.0 - Key Features
1. Custom Batch Size
Introduces a new --custom_batch_size N option. Use it & you can override the batch size (between 10 & 200).
Recommended values are between 10 & 50.
2. Notifications Enhancement
Adds a new --ntfy argument for enhanced notifications.
Sends a notification when the application starts.
Implements a notification function.
Prepares the status_notification for future improvements.
3. Data Collection and Persistence
Displays a message: "Your collected {rep} unique posts over the last 24h."
Statistics are now persistently stored.
Improves persist.py, including testing concurrent writes and abrupt cancellations.
Introduces a custom serializer.
Introduces the PersistedDict class.
Enhances the once_per_day function using persistence.
Implements notify_at for scheduled notifications.
Sets source_type from process_batch with a static social list.
Updates the IPFS schema following previous changes.
Renames status_notification to statistics_notification.
Introduces docker_version_notifier.
Adds informative messages about ntfy when it is used.
Fixes an embarrassing typo: "embarassement" -> "embarrassment".
Improved keyword extraction: 2-grams, better handling of $XXX # and other special chars, smarter keywords for a better topic monitoring.
This release introduces custom batch sizes, enhances notifications, and improves data collection and persistence, making it a significant update for the Exorde client.
Minor fix on brain + robustness upgrade
v2.4.9 Update setup.py
Robustness: Better statistics, network IO retries, better exceptions Latest
New features (mostly backend) impacting stability & internal items/REP statistics
- Faucet will not fail anymore during first worker initialization
- Spot-data transactions will have a try/retry mechanism, preventing loss of data & REP for the worker
- The Statistics array now shows REP earned per source, since the Worker started.
- Increased try/retry & timeouts for uploads
- What does the statistics table mean❔
- The number in each column (except the REP column) shows the counter of collected items per source.
- Collected items != REP earned. Because REP is earned only if you are the first on an item (tweet, comment, reddit post, article).
- The REP column will now provide this information.
- Fixed safetensors version to 0.31.0
better REP stats + error handling when IPFS upload
- Faucet will not fail anymore during first worker initialization
- Spot-data transactions will have a try/retry mechanism, preventing loss of data & REP for the worker
- the Statistics array now shows REP earned per source, since the Worker started.
:person_tipping_hand: What does the statistics table mean❔
The number in each column (except the REP column) shows the counter of collected items per source.
Collected items != REP earned. Because REP is earned only if you are the first on an item (tweet, comment, reddit post, article).
The REP column will now provide this information.
Logged statistics 1h / 24h + new scraper customizations + better keywords extract
New features: Only scrapers mode, overloading scrapers with your implementation
Use "--only" to use only a selection of scrapers. ex. --only twitter or --only youtube, twitter
-> case sensitive
Use "--quota source=XXX" to maximize the number of XXX items per day on a given source
Use --module_overwrite or --mo to overload a scraper module with your own github implementation. Example --mo twitter=https://github.com/USERNAME/a7df32de3a60dfdb7a0b
New keyword extraction with Keybert
New features: Only + Overwrite + Quota
New features: Only scrapers mode, overloading scrapers with your implementation
Use "--only" to use only a selection of scrapers. ex. --only twitter or --only youtube, twitter
-> case sensitive
Use "--quota source=XXX" to maximize the number of XXX items per day on a given source
Use --module_overwrite or --mo to overload a scraper module with your own github implementation. Example --mo https://github.com/YOU/ch4875eda56be56000ac
Paragraph/chunker system, improved metadata extraction
- added SoTa model+system to split text (multilingual) in many sentences: https://arxiv.org/pdf/2305.18893.pdf, wtpsplit https://github.com/bminixhofer/wtpsplit. Paragraphs are recomposed from splitter sentences, to make sure they remain below the new token max count per item.
- fixed \n replacement with spaces -> will improve some top keywords quality
- the chunker system will fix "tensor size" issues, and therefore increase the data output (instead of losing some batches once in a while)
- improved pre_install procedure to have 2 more models in the docker base image
- added tiktoken (OpenAI gpt3 tokenizer) library to count (& print) the number of tokens for each item, to help decide if the client has to split an item in several pieces (paragraphs)
Docker image for both amd64/arm64
fix: use chromedriver provided by distro (#27) Official chromedriver is built for amd64 only. This fix install it from Debian's packages instead to have build for both amd64 and arm64.