Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated retries #1776

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Automated retries #1776

wants to merge 17 commits into from

Conversation

ekzyis
Copy link
Member

@ekzyis ekzyis commented Dec 30, 2024

Description

close #1492 based on #1785, #1787

All failed invoices are returned to every client periodically via a new query failedInvoices. To make sure an invoice is only retried by one client at a time, we implemented expiring locks: we first try to update a new timestamp column retryPendingSince on the invoice. Only one client at a time will be able to successfully update the invoice. This lock expires after a minute after which another retry for the same invoice can be attempted by a client.

To stop after three payment attempts (= two retries), a new integer column Invoice.paymentAttempt tracks at which payment attempt we are. This number is increased when we retry an invoice and pass newAttempt: true which we do when we retry these fetched failed invoices. When the number is increased, the payment will start from the beginning with all sender and receiver wallets available.

TODO:

  • retry returned failed invoices on client
  • count retries so we can stop returning them after we hit a limit
  • only show failed invoices that have exhausted retries in notifications
  • don't retry invoices that have been manually cancelled?

added userCancel column, see #1785

  • don't return failed invoices to clients that don't have send wallets

client only polls when it has send wallets

  • don't return intermediate failed invoices that will be retried due to sender or receiver fallbacks

added "cancelledAt" < now() - interval '${WALLET_RETRY_AFTER_MS} milliseconds' filter

  • ⚠️ make sure when this is deployed, this will not start retrying old failed invoices ⚠️

added WALLET_RETRY_BEFORE_MS used in this filter:

AND now()::timestamp <@ tsrange(
  "cancelledAt" + interval '${WALLET_RETRY_AFTER_MS} milliseconds',
  "cancelledAt" + interval '${WALLET_RETRY_BEFORE_MS} milliseconds'
)
  • use lock in retry mutation

new TODO since feedback:

  • also update hasNewNotes
  • fix no notification if no wallets are attached and QR payment fails since they are never retried

Additional Context

  • this will still retry invoices that have been created before deployment and failed if they aren't older than a day
  • if a client detaches all wallets after they tried a payment for the first time, it will never show up in notifications as failed since it wasn't retried often enough. I consider this to be an edge case we don't need to worry about.
  • see https://github.com/stackernews/stacker.news/pull/1776/files#r1907791409

Checklist

Are your changes backwards compatible? Please answer below:

yes

On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:

7. Tested automated retries for posting, replies and zapping with this patch:

diff --git a/api/resolvers/wallet.js b/api/resolvers/wallet.js
index 8e6f3f0e..fb9edd2f 100644
--- a/api/resolvers/wallet.js
+++ b/api/resolvers/wallet.js
@@ -469,7 +469,6 @@ const resolvers = {
         SELECT * FROM "Invoice"
         WHERE "userId" = ${me.id}
         AND "actionState" = 'FAILED'
-        AND "userCancel" = false
         AND "cancelledAt" < now() - ${`${WALLET_RETRY_AFTER_MS} milliseconds`}::interval
         AND "cancelledAt" > now() - ${`${WALLET_RETRY_BEFORE_MS} milliseconds`}::interval
         AND "paymentAttempt" < ${WALLET_MAX_RETRIES}
diff --git a/lib/constants.js b/lib/constants.js
index fe46b71a..09b6981a 100644
--- a/lib/constants.js
+++ b/lib/constants.js
@@ -198,7 +198,7 @@ export const WALLET_CREATE_INVOICE_TIMEOUT_MS = 45_000
 // interval between which failed invoices are returned to a client for automated retries.
 // retry-after must be high enough such that intermediate failed invoices that will already
 // be retried by the client due to sender or receiver fallbacks are not returned to the client.
-export const WALLET_RETRY_AFTER_MS = 60_000 // 1 minute
+export const WALLET_RETRY_AFTER_MS = 5_000 // 1 minute
 export const WALLET_RETRY_BEFORE_MS = 86_400_000 // 24 hours
 // we want to attempt a payment three times so we retry two times
 export const WALLET_MAX_RETRIES = 2
@@ -277,7 +277,7 @@ function RetryHandler ({ children }) {
           console.error('retry poll failed:', err)
         }
         if (!stopped) queuePoll()
-      }, NORMAL_POLL_INTERVAL)
+      }, 1_000)
     }
     const stopPolling = () => {
       stopped = true

Simulate multiple clients with this patch:

diff --git a/wallets/index.js b/wallets/index.js
index 42fb55be..8a9f5f3f 100644
--- a/wallets/index.js
+++ b/wallets/index.js
@@ -255,9 +255,9 @@ function RetryHandler ({ children }) {
         return
       }

-      for (const inv of failedInvoices) {
+      for (const inv of [...failedInvoices, ...failedInvoices]) {
         try {
-          await retry(inv)
+          retry(inv).catch(err => console.error('retry failed:', err))
         } catch (err) {
           // some retries are expected to fail since only one client at a time is allowed to retry
           // these should show up as 'invoice not found' errors

Test p2p zaps with this patch that makes forwards fail and disables the fallback to CCs:

diff --git a/api/paidAction/index.js b/api/paidAction/index.js
index 8feabc30..031e4d7e 100644
--- a/api/paidAction/index.js
+++ b/api/paidAction/index.js
@@ -355,6 +355,7 @@ export async function retryPaidAction (actionType, args, incomingContext) {
       invoiceArgs = { bolt11, wrappedBolt11, wallet, maxFee }
     } catch (err) {
       console.log('failed to retry wrapped invoice, falling back to SN:', err)
+      throw err
     }
   }
diff --git a/worker/paidAction.js b/worker/paidAction.js
index 9b3ecb5a..b9172ebf 100644
--- a/worker/paidAction.js
+++ b/worker/paidAction.js
@@ -268,7 +268,7 @@ export async function paidActionForwarding ({ data: { invoiceId, ...args }, mode
     payViaPaymentRequest({
       lnd,
       request: bolt11,
-      max_fee_mtokens: String(maxFeeMsats),
+      max_fee_mtokens: String(0),
       pathfinding_timeout: LND_PATHFINDING_TIMEOUT_MS,
       confidence: LND_PATHFINDING_TIME_PREF_PPM,
       max_timeout_height: maxTimeoutHeight

For frontend changes: Tested on mobile, light and dark mode? Please answer below:

n/a

Did you introduce any new environment variables? If so, call them out explicitly here:

no

@ekzyis ekzyis added feature new product features that weren't there before wallets labels Dec 30, 2024
@ekzyis ekzyis marked this pull request as draft December 30, 2024 03:28
@ekzyis ekzyis force-pushed the automated-retries branch 4 times, most recently from eb65eb5 to 7c875d1 Compare January 1, 2025 18:02
@ekzyis ekzyis force-pushed the automated-retries branch from 7c875d1 to c0157fc Compare January 1, 2025 18:28
@ekzyis ekzyis force-pushed the automated-retries branch 10 times, most recently from 8757074 to 09cb758 Compare January 8, 2025 18:44
@ekzyis ekzyis marked this pull request as ready for review January 8, 2025 19:54
@ekzyis ekzyis requested a review from huumn January 8, 2025 19:54
api/resolvers/wallet.js Outdated Show resolved Hide resolved
wallets/index.js Outdated Show resolved Hide resolved
wallets/index.js Outdated Show resolved Hide resolved
wallets/index.js Show resolved Hide resolved
@ekzyis ekzyis force-pushed the automated-retries branch from f79f837 to ccbde0d Compare January 9, 2025 01:22
api/resolvers/wallet.js Outdated Show resolved Hide resolved
@ekzyis ekzyis force-pushed the automated-retries branch from ccbde0d to 15c799d Compare January 13, 2025 10:45
api/resolvers/paidAction.js Outdated Show resolved Hide resolved
@ekzyis ekzyis force-pushed the automated-retries branch from 0da5cad to 95e4000 Compare February 4, 2025 18:26
@ekzyis
Copy link
Member Author

ekzyis commented Feb 4, 2025

I have tested 95e4000 again. I used this patch to test the locking and make sure only one client at a time will be able to retry. Essentially, it allows retries of QR codes, makes retrying faster and attempts to retry every invoice twice:

diff --git a/api/resolvers/wallet.js b/api/resolvers/wallet.js
index 8e6f3f0e..fb9edd2f 100644
--- a/api/resolvers/wallet.js
+++ b/api/resolvers/wallet.js
@@ -469,7 +469,6 @@ const resolvers = {
         SELECT * FROM "Invoice"
         WHERE "userId" = ${me.id}
         AND "actionState" = 'FAILED'
-        AND "userCancel" = false
         AND "cancelledAt" < now() - ${`${WALLET_RETRY_AFTER_MS} milliseconds`}::interval
         AND "cancelledAt" > now() - ${`${WALLET_RETRY_BEFORE_MS} milliseconds`}::interval
         AND "paymentAttempt" < ${WALLET_MAX_RETRIES}
diff --git a/lib/constants.js b/lib/constants.js
index 9ec20245..de55ac05 100644
--- a/lib/constants.js
+++ b/lib/constants.js
@@ -201,7 +201,7 @@ export const WALLET_CREATE_INVOICE_TIMEOUT_MS = 45_000
 // interval between which failed invoices are returned to a client for automated retries.
 // retry-after must be high enough such that intermediate failed invoices that will already
 // be retried by the client due to sender or receiver fallbacks are not returned to the client.
-export const WALLET_RETRY_AFTER_MS = 60_000 // 1 minute
+export const WALLET_RETRY_AFTER_MS = 5_000 // 1 minute
 export const WALLET_RETRY_BEFORE_MS = 86_400_000 // 24 hours
 // we want to attempt a payment three times so we retry two times
 export const WALLET_MAX_RETRIES = 2
diff --git a/wallets/index.js b/wallets/index.js
index 42fb55be..ab2580c3 100644
--- a/wallets/index.js
+++ b/wallets/index.js
@@ -255,7 +255,9 @@ function RetryHandler ({ children }) {
         return
       }

-      for (const inv of failedInvoices) {
+      console.log('XXX failedInvoices', failedInvoices)
+
+      for (const inv of [...failedInvoices, ...failedInvoices]) {
         try {
           await retry(inv)
         } catch (err) {
@@ -277,7 +279,7 @@ function RetryHandler ({ children }) {
           console.error('retry poll failed:', err)
         }
         if (!stopped) queuePoll()
-      }, NORMAL_POLL_INTERVAL)
+      }, 1_000)
     }
     const stopPolling = () => {
       stopped = true
2025-02-04.19-29-10.mp4

@ekzyis ekzyis requested a review from huumn February 4, 2025 19:34
@huumn
Copy link
Member

huumn commented Feb 6, 2025

#1776 is the year civilization was automatically retied in america

Copy link
Member

@huumn huumn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'll QA fully tomorrow and fix any nitpicks (if I have any) tomorrow.

// we want to attempt a payment three times so we retry two times
export const WALLET_MAX_RETRIES = 2
// when a pending retry for an invoice should be considered expired and can be attempted again
export const WALLET_RETRY_TIMEOUT_MS = 60_000 // 1 minute
Copy link
Member

@huumn huumn Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't be subject to multiple WALLET_CREATE_INVOICE_TIMEOUT_MS during a retry right? I remember you mentioning something like that in a previous pr. (Just double checking that this time lock isn't too short.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh, good catch!

We actually can be subject to multiple WALLET_CREATE_INVOICE_TIMEOUT_MS during a retry because the timeout is applied in walletCreateInvoice which is called in the receiver wallet loop until we found a wallet that returned an invoice:

for (const { def, wallet } of wallets) {
const logger = walletLogger({ wallet, models })
try {
logger.info(
`↙ incoming payment: ${formatSats(msatsToSats(msats))}`,
{
amount: formatMsats(msats)
})
let invoice
try {
invoice = await walletCreateInvoice(
{ wallet, def },
{ msats, description, descriptionHash, expiry },
{ logger, models })
} catch (err) {
throw new Error('failed to create invoice: ' + err.message)
}

This also affects WALLET_RETRY_AFTER_MS afaict. It exists to avoid returning failed invoices which will already get retried by a client due to sender or receiver fallbacks (the time between FAILED and RETRYING for an invoice). But since we transition an invoice to RETRYING at the end of the mutation and therefore after we called walletCreateInvoice, this means an invoice can also stay failed for n * WALLET_CREATE_INVOICE_TIMEOUT_MS which is currently larger than WALLET_RETRY_AFTER_MS for n=2 even though we can assume it will get retried.

I think the solution is to also put a timeout on the full loop instead of only on each external wallet call. To give each wallet independent of order the same amount of time, we could divide the loop timeout by the amount of receiving wallets available. This means that the more receiving wallets you have, the less time is allocated to each wallet. This might be confusing since adding a second receiving wallet can make the first receiving wallet more likely to fail since it effectively has now only half the amount of time to respond as before. But I think having a timeout on the loop that doesn't depend on how many wallets someone has is more important and we can of course tweak the timeouts as we like.

For example, we could use a 1 minute timeout on the full loop so if you have two receiving wallets, each wallet gets 30 seconds to return an invoice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is at least less of an issue than I thought initially. Worst case here is that some unused invoices get created. Initially I thought that multiple payments might be made, but only one call to retryPaidAction will succeed because the invoice is transitioned (and returned) conditional on it being in a FAILED state.

tl;dr we can keep this as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing #1734 would be a better idea than any more messing with timeouts anyway

@huumn
Copy link
Member

huumn commented Feb 6, 2025

If there are no send wallets attached, payment failures never show up in notifications because they are never retried.

I think we'll probably want another conditional on the notification that says: if it's been 10 minutes (or whatever) and 0 retries have been attempted, display notification. Perhaps we can reuse WALLET_RETRY_BEFORE_MS and make it narrower, or create another value for this specific purpose.

@huumn
Copy link
Member

huumn commented Feb 7, 2025

Also as noted in chat, the hasNewNotes resolver is not updated to reflect the additional conditions on paid action notifications.

@huumn
Copy link
Member

huumn commented Feb 7, 2025

From chat:

I think the move to deal with having a sending wallet or not is:

  1. just retry (without a qr fallback) whether there's a sending wallet or not. The attempts counter will increase and all will work as intended with just a one line change.
  2. Make a special case when the paid action is an ITEM_CREATE ... don't auto-retry on userCancel but still show the retry notification.

@ekzyis ekzyis marked this pull request as draft February 7, 2025 02:18
@ekzyis ekzyis force-pushed the automated-retries branch from 4f7d9d4 to 2c14275 Compare February 9, 2025 14:20
@ekzyis
Copy link
Member Author

ekzyis commented Feb 9, 2025

2c14275 now immediately shows failed payments for posts or comments in /notifications and also allows auto-retries for them:

2025-02-09.15-17-00.mp4

I also made sure that auto-retries work even if you already retried manually once. It works because we only increment the payment counter if it was automatically retried. This is also why we can retry manually forever.

However, this commit does implement auto-retries even if userCancel is set which is against what was mentioned here:

don't auto-retry on userCancel but still show the retry notification.

I think it's better to have automated retries for everything. It doesn't mess with showing something in the notifications before all automated retries were attempted. If you're currently looking at your notifications and an invoice is automatically retried, it will update the notification you're looking at.

Or am I missing something why we shouldn't allow automated retries for userCancel in this specific case?

We don't do it for zaps because there we don't know if they expected that the zap gets aborted if they close the QR code.

But if you posted something and already tried to pay once, I think we can assume they want to continue trying to pay for it.1

edit: I still want to think more about this but it's definitely good to know that we can never accidentally pay twice thanks to our lock in retries as mentioned in #1776 (comment) (but I also want to think through this myself more).

Footnotes

  1. I have an item where that is not true but I think it's still a good general assumption.

@ekzyis ekzyis marked this pull request as ready for review February 9, 2025 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new product features that weren't there before wallets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

automatic retries of failed paid actions
2 participants