Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status Desktop and Mobile active Waku peers metrics #54

Open
sunleos opened this issue Jan 14, 2025 · 20 comments
Open

Status Desktop and Mobile active Waku peers metrics #54

sunleos opened this issue Jan 14, 2025 · 20 comments

Comments

@sunleos
Copy link

sunleos commented Jan 14, 2025

Would like to request the following metrics:

  1. count of unique active Status Desktop peers per day/week/month
  2. count of unique active Status Mobile peers per day/week/month
@jakubgs
Copy link
Member

jakubgs commented Jan 15, 2025

There are at least four ways to do this:

  1. Trivial - Use existing telemetry data that is available in Superset.
  2. Simple - Add peer metrics to nim-waku to get information about what is connected to given node.
  3. Better - Query nim-waku logs for unique peer IDs in Kibana and count those.
  4. Complex - Modify nim-waku logs to include node name with peer ID to categorize the count.

Option #1 already works, but the way data is displayed might not be great, and architecture is missing from the data.
Option #2 would give us counts, but not exactly unique counts, since one mobile or desktop node can connect to many servers.
Option #3 would give us unique count, but without separating out servers, js-waku nodes, or any other type of peer.
Option #4 would give us unique count and types, but most probably require modification of nim-waku code.

@jakubgs
Copy link
Member

jakubgs commented Jan 15, 2025

The option #2 is collected by telemetry service deployed on infra-misc fleet and the dashboard is hosted on Superset:

It does contain the distinction between linux and darwin as well as Waku versions, but that's about it.

@jakubgs
Copy link
Member

jakubgs commented Jan 15, 2025

The option #2 used to look like this:

Image

https://metrics.status.im/d/gxQG_R1Zk/status-peers?orgId=1&refresh=15m&from=now-90d&to=now

These metrics were added by me in this PR:

And were generated from node names which look like this:

StatusIM/vrelease-0.30.1-beta.2/android-arm/go1.11.5
Statusd/v0.34.0-beta.3/linux-amd64/go1.13.1
Geth/v1.9.9-stable-5aa131ca/linux-amd64/go1.13.3

Which would be then turned into peer metrics that looked like this:

 > curl -s localhost:9305/metrics | grep p2p_PeersCount
# HELP p2p_PeersCount Current numbers of peers split by name.
# TYPE p2p_PeersCount gauge
p2p_PeersCount{platform="android-arm",type="StatusIM",version="vrelease-0.30.1-beta.2"} 1
p2p_PeersCount{platform="linux-amd64",type="Statusd",version="v0.29.0-beta.2"} 3

Docks: https://github.com/status-im/status-go/blob/develop/metrics/README.md#metrics

This is nice to have, but main issue is that desktop and mobile peers can connect to multiples server nodes.

@jakubgs jakubgs changed the title Status Desktop and Mobile active Waku peers Status Desktop and Mobile active Waku peers metrics Jan 15, 2025
@sunleos
Copy link
Author

sunleos commented Jan 15, 2025

@jakubgs, thanks for the details above. Really helpful.

Actually looking for unique count and at least some way to separate between Desktop and Mobile data. Are there any other options or ways to get these metrics without resorting to Option #4? Maybe anyone else on the team who may have any ideas.

@sunleos
Copy link
Author

sunleos commented Jan 15, 2025

@jakubgs, re Option #2 - there is no data point that could help us turn this count into a unique count?

@jakubgs
Copy link
Member

jakubgs commented Jan 16, 2025

I see no other options than what I outlined, you can try asking some Waku people, but I doubt there's anything else.

Since option #2 is specific to each node on each server and is purely volumetric - meaning it's only a count of nodes of given typ - it means there is no way to remove duplicates from the counts. And assuming desktop and mobile can connect to many nodes the sum will always be higher than the actual number of unique app peers.

@fryorcraken
Copy link

Note that the Waku team took over the telemetry service for the purpose of measuring reliability and our learning is that this service is not worth being maintained: https://forum.vac.dev/t/the-future-of-status-telemetry/405/4

The network component of Status is handle by the Waku protocol and team, so best to talk to us for this kind of data. The Waku protocol is a decentralized p2p communication framework. This is not a web2 app where infra can magically summon numbers out of cookies.

From a Waku PoV, there is no difference between desktop and mobile nodes. So such data collection will have to be done in an intrusive manner (ie, via telemetry or some other centralized collection model where the app push this specific data to a centralized server).

Having said that, Status mobile can only run in edge mode (light client), whereas desktop can run in both edge and relay.
It should be possible to count edge client and relay client separately because of the protocols they mount.

A panel that looks at the protocol and count "filter" vs "relay" peers would give an indication of "edge" vs "relay" peers. But this will not be exact as our fleet only has a partial view of the network.
All nodes are "store" clients, and the fleet is the only one to provide store services, giving you a more exact total count of nodes (we are working towards removing this centralized point).

Finally, note that I am trying to plan improving the Waku Network Monitor and deploying on Status fleet to get this kind of info. But again, this only provides a partial view of the network.

@jakubgs
Copy link
Member

jakubgs commented Jan 21, 2025

This is not a web2 app where infra can magically summon numbers out of cookies.

Image

From a Waku PoV, there is no difference between desktop and mobile nodes. So such data collection will have to be done in an intrusive manner (ie, via telemetry or some other centralized collection model where the app push this specific data to a centralized server).

And yet @jm-clius in a Discord thread has said that:

Do you mean in the metadata protocol itself? Generally too much info on the platform, software and version is sensitive yes. It reduces the k-anonymity set, allows attackers to exploit vulnerabilities on specific targets, etc. Like Franck suggested in the issue, I would rather count supported protocols with a network monitor - for example, filter server (those mounting filter-subscribe) vs filter client (those mounting filter-push) gives an approximation of desktop vs mobile.

So there is some difference which we could use as indication. As long that is the case and we provide a log message on peer connection that would include the peer ID and the information that would allow us to detect - or guess - the type of application installation. With that I can then collect that in our ELK stack and graph that on Kibana. That part is trivial

Now, I don't particularly care if you want to provide this data or not. You can argue with @sunleos about privacy concerns and whatever else you might want to. I'm just interested in delivering what is doable.

@fryorcraken
Copy link

All apps must connect to our fleet store nodes to retrieve messages because store is not decentralized (yet).
They just connect, get messages, and disconnect. So it does not give you a view of how many peers exists right now, but it can help with trends.

Here is a panel: https://grafana.infra.status.im/d/qrp_ZCTGz/nim-waku-v2?orgId=1&refresh=30s&var-host=store-.*&var-fleet=status.prod&var-dc=All&from=1737415152607&to=1737501552609&viewPanel=2

Something like that would be good, but best to sum all peers across all nodes (a peer will use one store node at a time).
@chaitanyaprem can you confirm my statements?

Now, in terms of mobiel vs desktop.

Yes, so what @jm-clius said is the same as what I said. If we count nodes with "filter" vs "relay" protocols mounted, we can get a rough idea of mobile-desktop ratio. it's not exact because anyone can enable "light mode" in desktop, and start looking like a mobile.

For that, looking at libp2p protocol via identify is what you want https://grafana.infra.status.im/d/b819dbfe-acb6-4086-8736-578ca148d7cd/waku-networkmonitor-v2?orgId=1&refresh=5s&from=1737498267860&to=1737501867860&viewPanel=6, and agian, have it looking at prometheus from the store nodes.

  • /vac/waku/relay/2.0.0 -> desktop node
  • /vac/waku/filter-push/2.0.0-beta1 -> mobile node I believe (not sure)
  • /vac/waku/store/2.0.0-beta4 -> all nodes

@chaitanyaprem @richard-ramos can help confirm here because both desktop in relay mode and mobile in light mode would have "filter" mounted, would the light client return a different protocol on libp2p identify?

@chaitanyaprem
Copy link

@chaitanyaprem can you confirm my statements?

your statements are correct @fryorcraken . one minor change is that the client may not disconnect from store node as we have periodic query. But in case of mobile data clients would end up disconnecting.

one more point to note is that clients can use store nodes from prod/staging fleet.

@chaitanyaprem
Copy link

@chaitanyaprem @richard-ramos can help confirm here because both desktop in relay mode and mobile in light mode would have "filter" mounted, would the light client return a different protocol on libp2p identify?

identify will have same filter protocol for lightclient whether it is mobile/desktop. So it would be hard to know which is which. e.g I run my desktop in lightclient mode due to monthly data usage limitations from my ISP.

@jakubgs
Copy link
Member

jakubgs commented Jan 22, 2025

It was also added by @adklempner on Discord that:

I double checked using status-desktop and this log returns a different value after restarting
https://github.com/status-im/status-go/blob/3e0b1b273007ba8fae3c836d097a3880d12fecb9/wakuv2/waku.go#L1075

2025-01-21T22:48:17.763Z    INFO    wakuv2/waku.go:1074    WakuV2 PeerID    {"id":"16Uiu2HAm42hEXsnXi7gLUuuHcCQsFak1Zt2U3EyCnmGz8V3f7DR4"}
2025-01-21T22:49:57.795Z    INFO    wakuv2/waku.go:1074    WakuV2 PeerID    {"id":"16Uiu2HAmMiezvpqdTXAQu9yXjcZxVK2tzA9VnGZ9aScAoAvbX4gc"}

in go-libp2p docs, when starting a new libp2p node: "- If no peer identity is provided, it generates a random Ed25519 key-pair and derives a new identity from it;"

So essentially, unless we enforce a persistent peer ID in mobile and desktop applications counting of unique peers is impossible.

@fryorcraken
Copy link

This is not a web2 app where infra can magically summon numbers out of cookies.

While hilarious, it was not appropriate. My apologies @sunleos

@fryorcraken
Copy link

So essentially, unless we enforce a persistent peer ID in mobile and desktop applications counting of unique peers is impossible.

Yes correct.

identify will have same filter protocol for lightclient whether it is mobile/desktop. So it would be hard to know which is which. e.g I run my desktop in lightclient mode due to monthly data usage limitations from my ISP.

Yes correct, hence why I said

. it's not exact because anyone can enable "light mode" in desktop, and start looking like a mobile.

identify will have same filter protocol for lightclient whether it is mobile/desktop.

Ok so it means that the best way is to count relay peers, and substract them from store peers to get non-relay peers, ie, light client, ie, mostly mobile (but also some desktops).

@jakubgs
Copy link
Member

jakubgs commented Jan 22, 2025

Now, the question is, will mobile and desktop be fine with changing the behavior to actually persist peer IDs, since it can be considered a reduction in privacy and anonymity. I can probably predict what Jarrads reaction would be to it.

@jakubgs
Copy link
Member

jakubgs commented Jan 22, 2025

@ilmotta has confirmed for me that Mobile does indeed change peer ID on every node restart, even just logout is enough:

2025-01-22T09:36:24.194Z        INFO    wakuv2/waku.go:1074     WakuV2 PeerID   {"id": "16Uiu2HAm7G9y5bpWA1b8VDjCkzEQWtcuN3MUCprLFr7g1ibHNWse"}
2025-01-22T09:36:45.463Z        INFO    wakuv2/waku.go:1074     WakuV2 PeerID   {"id": "16Uiu2HAkwP34aG5CN9N9aVAFQtUWkun7dk2D2BjybFVY6EqdVvDu"}
2025-01-22T09:37:31.535Z        INFO    wakuv2/waku.go:1074     WakuV2 PeerID   {"id": "16Uiu2HAmSXEavEmwC5DHr1GvNrtLyGmcwL2bie5bHjXgA8ofmoWU"}

@adklempner
Copy link

We've identified two caveats:

  1. we cannot expect peer IDs to stay consistent for a single instance of mobile or desktop
  2. the best method for identifying whether a node is running on mobile and desktop is not 100% accurate

I believe the first caveat can be solved by using the same strategy I had implemented for telemetry. In status-go, we take snapshots of peers connected to the libp2p instance and then use the peer store to perform filtering to determine the shard and discovery origin of each connected peer. https://github.com/status-im/status-go/blob/5240da6a2044eb79574ef6c8b82aa0e1798b1584/wakuv2/waku.go#L1259

I think this method for counting peers using periodic snapshots instead of aggregating logs can provide an accurate number of unique peers, even with the caveat that a single instance of Status desktop/mobile does not maintain a consistent peer ID. Presumably, even if the peer ID for a libp2p node changes, another node will only keep a single connection with that instance.

It looks like the Waku peer store also provides the supported protocols for a peer https://github.com/waku-org/go-waku/blob/master/waku/v2/peerstore/inherited.go#L122 This should allow us to split the unique peers into whether they are likely to be mobile or desktop. This may not be 100% accurate given the second caveat.

@fryorcraken
Copy link

I think this method for counting peers using periodic snapshots instead of aggregating logs can provide an accurate number of unique peers

What do you mean? Can't get peer id over Prometheus. How is it different from counting at number of peers connected to store nodes?

@jakubgs
Copy link
Member

jakubgs commented Feb 3, 2025

@fryorcraken my understanding is that desktop and mobile peers can be connected to multiple waku nodes at the same time. And if that is indeed the case then summing up counts of peers would give a higher number than in reality, unless we compare peer IDs.

@adklempner I don't really see the difference between telemetry and counting unique peer IDs from logs in Kibana for a given period, for example 24 hours. It seems like we are talking about the same thing.

@jakubgs
Copy link
Member

jakubgs commented Feb 3, 2025

In Kibana it is possible to count unique field values for a given time period:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants