Skip to content

Releases: exorde-labs/exorde-client

Minor fixes - Multilang keyword selection - Web interface (debug) - Sentiment v2

17 Oct 05:43
1b50e85
Compare
Choose a tag to compare

Multilang keyword selection
Web interface (debug)
Stability fixes
Performance improvement
Sentiment upgraded

2.5.1 - custom batch size, PUSH notification system, regular updates about REP & collected items over last 24h

21 Sep 15:18
5b72fd0
Compare
Choose a tag to compare

2.5.0 - custom batch size, PUSH notification system, regular updates about REP & collected items over last 24h

20 Sep 13:28
53fa411
Compare
Choose a tag to compare

Release 2.5.0 - Key Features

1. Custom Batch Size

Introduces a new --custom_batch_size N option. Use it & you can override the batch size (between 10 & 200).
Recommended values are between 10 & 50.

2. Notifications Enhancement

Adds a new --ntfy argument for enhanced notifications.
Sends a notification when the application starts.
Implements a notification function.
Prepares the status_notification for future improvements.

3. Data Collection and Persistence

Displays a message: "Your collected {rep} unique posts over the last 24h."
Statistics are now persistently stored.
Improves persist.py, including testing concurrent writes and abrupt cancellations.
Introduces a custom serializer.
Introduces the PersistedDict class.
Enhances the once_per_day function using persistence.
Implements notify_at for scheduled notifications.
Sets source_type from process_batch with a static social list.
Updates the IPFS schema following previous changes.
Renames status_notification to statistics_notification.
Introduces docker_version_notifier.
Adds informative messages about ntfy when it is used.
Fixes an embarrassing typo: "embarassement" -> "embarrassment".

Improved keyword extraction: 2-grams, better handling of $XXX # and other special chars, smarter keywords for a better topic monitoring.

This release introduces custom batch sizes, enhances notifications, and improves data collection and persistence, making it a significant update for the Exorde client.

Minor fix on brain + robustness upgrade

10 Aug 15:03
6ccd5a9
Compare
Choose a tag to compare

Robustness: Better statistics, network IO retries, better exceptions Latest

08 Aug 13:25
e8590a4
Compare
Choose a tag to compare

New features (mostly backend) impacting stability & internal items/REP statistics

  1. Faucet will not fail anymore during first worker initialization
  2. Spot-data transactions will have a try/retry mechanism, preventing loss of data & REP for the worker
  3. The Statistics array now shows REP earned per source, since the Worker started.
  4. Increased try/retry & timeouts for uploads
  5. What does the statistics table mean❔
  6. The number in each column (except the REP column) shows the counter of collected items per source.
  7. Collected items != REP earned. Because REP is earned only if you are the first on an item (tweet, comment, reddit post, article).
  8. The REP column will now provide this information.
  9. Fixed safetensors version to 0.31.0

better REP stats + error handling when IPFS upload

07 Aug 21:57
f0cf294
Compare
Choose a tag to compare
  • Faucet will not fail anymore during first worker initialization
  • Spot-data transactions will have a try/retry mechanism, preventing loss of data & REP for the worker
  • the Statistics array now shows REP earned per source, since the Worker started.

:person_tipping_hand: What does the statistics table mean❔

The number in each column (except the REP column) shows the counter of collected items per source.
Collected items != REP earned. Because REP is earned only if you are the first on an item (tweet, comment, reddit post, article).
The REP column will now provide this information.

Logged statistics 1h / 24h + new scraper customizations + better keywords extract

03 Aug 20:07
0a2c2f0
Compare
Choose a tag to compare

New features: Only scrapers mode, overloading scrapers with your implementation
Use "--only" to use only a selection of scrapers. ex. --only twitter or --only youtube, twitter
-> case sensitive
Use "--quota source=XXX" to maximize the number of XXX items per day on a given source
Use --module_overwrite or --mo to overload a scraper module with your own github implementation. Example --mo twitter=https://github.com/USERNAME/a7df32de3a60dfdb7a0b
New keyword extraction with Keybert

New features: Only + Overwrite + Quota

03 Aug 09:24
9158db8
Compare
Choose a tag to compare

New features: Only scrapers mode, overloading scrapers with your implementation
Use "--only" to use only a selection of scrapers. ex. --only twitter or --only youtube, twitter
-> case sensitive
Use "--quota source=XXX" to maximize the number of XXX items per day on a given source
Use --module_overwrite or --mo to overload a scraper module with your own github implementation. Example --mo https://github.com/YOU/ch4875eda56be56000ac

Paragraph/chunker system, improved metadata extraction

21 Jul 13:34
306c071
Compare
Choose a tag to compare
  • added SoTa model+system to split text (multilingual) in many sentences: https://arxiv.org/pdf/2305.18893.pdf, wtpsplit https://github.com/bminixhofer/wtpsplit. Paragraphs are recomposed from splitter sentences, to make sure they remain below the new token max count per item.
  • fixed \n replacement with spaces -> will improve some top keywords quality
  • the chunker system will fix "tensor size" issues, and therefore increase the data output (instead of losing some batches once in a while)
  • improved pre_install procedure to have 2 more models in the docker base image
  • added tiktoken (OpenAI gpt3 tokenizer) library to count (& print) the number of tokens for each item, to help decide if the client has to split an item in several pieces (paragraphs)

Docker image for both amd64/arm64

10 Jul 11:48
44ef863
Compare
Choose a tag to compare
fix: use chromedriver provided by distro (#27)

Official chromedriver is built for amd64 only. This fix  install it from Debian's packages instead to have build for both amd64 and arm64.