Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New chat logs service #81

Open
ForNeVeR opened this issue Sep 4, 2022 · 0 comments
Open

New chat logs service #81

ForNeVeR opened this issue Sep 4, 2022 · 0 comments

Comments

@ForNeVeR
Copy link
Member

ForNeVeR commented Sep 4, 2022

(Extracted from #27).

To reiterate:

  • We have an archive of old logs collected during 2009-08-22 — 2016-12-09, uploaded to ctor and backed up on my computer. These were collected by DeadBot (an old freqbot instance hosted by 0xd34df00d). I remember the time zone on the DeadBot's computer to be very off occasionally.
  • We have an archive from my Miranda IM in form of a bunch of text files, which may be useful in restoring some information not available in DeadBot's logs. These files are currently unavailable to me, but should arrive in a month or so.
  • Telegram logs of codingteam are available since 2016-10-25.
  • Since 2017-01-25, we have Telegram to XMPP integration available, that mostly works and syncs the messages (though the timestamps are obviously different from XMPP). There were two different integration bots (older one using chat-linker, and a newer one using Emulsion).
  • We also have a Matrix room that's not connected to the other ones, but we have plans to do so. And it's also a part of our history, and thus should be a part of our logs.
  • All the previous time, we were relying on jabber.ru storing out chat logs, but somewhere between 2022-02-24 and today these were taken down completely.

So, we have several different chat networks sometimes interlinked, and several different sources of information (sometimes possibly contradicting each other!) are available for certain log periods. And currently we have no working log solution, and no history of the recent events.

This should be fixed.

The idea of reconciliation of several different log sources and creating a combined log archive is pretty old (I remember us discussing it in the great old days of [email protected], when we were discussing integration of Letnan-Ferry – remember that fella? – and DeadBot), and we have some traces of it in our various IM-related projects:

  • cthulhu-bot from 2009 is able to read the Miranda text log format, though it's difficult for me to reember the exact purpose of that process
  • there's styx-miranda, my project to synchronize logs online between several IM client instances
  • horta-hell's is also able to read logs from Miranda's format via its MarkovPlugin
  • we have aforementioned horta-web, a service that was reusing horta-hell's database if I remember correctly, to present the logs in a web page
  • we also have Yog-Sothoth (my, my, what a name!), a .NET based chat logging service
  • we have a branch named 27-archivarius in this very repository, where I was trying to introduce yet another simple XMPP-based solution

Most of the stuff is discontinued, and/or XMPP-focused, but we need something bigger here. Some challenges:

  • so far, there's no single "best" source that contains everything (even in the scope of one chat network), so we have to combine the information from several sources
  • the information sources contain a lot of overlapping data, and sometimes even contradict each other, and even themselves (since the timestamps differ a lot between the sources, and sometimes the same message may be repeated several times in old XMPP logs)
  • XMPP and Telegram are sometimes connected and sometimes they aren't, which creates some problems in log reconciliation between the two. We should properly recognize the synchronized messages sent via one of our bots.
  • Telegram logs aren't available via the normal bot API, and only available via MTProto.

So, I believe that we'll need to create a separate application (or just a service in this repository? TBD) that

  • will allow the admin to upload the logs in one of the supported formats, including some form of Telegram messages dump (or perform a quick one-shot Telegram message import online)
  • will try to automatically deduplicate the available messages across different sources, and allow the admin to do it manually if required
  • will also implement a Telegram bot API to catch and log any new messages online
  • I don't think we need an XMPP bot right now, since Emulsion is more or less reliable these days
  • I believe it's also a good idea to try to recover our XMPP logs between 2016-12 and 2017-02 in case they contain anything important
  • Matrix should also be supported in the future, including the ability to import message history, or receive the logs online
  • the data should be imported in raw format (e.g. deduplication shouldn't alter the original data)
  • there should be a web API and a web UI that allows to see various data attributes (source, amount/types of sources confirmed a particular event in case of deduplication) and filter on these attributes
  • we also need to archive the outdated projects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant