Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] token_mint_address in tokens_solana.transfers is different on solscan #6690

Open
0xroll opened this issue Sep 8, 2024 · 15 comments
Open
Assignees
Labels
bug Something isn't working in review Assignee is currently reviewing the PR

Comments

@0xroll
Copy link
Contributor

0xroll commented Sep 8, 2024

Description

token_mint_address in tokens_solana.transfers is different on solscan .

some examples in the discord thread https://discord.com/channels/757637422384283659/1281025587405787177/1282371830186774651

Current behavior

token_mint_address showing a different address

Expected behavior

addresses should match with solana explorers

Impacted model(s)

https://github.com/duneanalytics/spellbook/blob/main/dbt_subprojects/solana/models/tokens/solana/tokens_solana_transfers.sql#L66

Possible solution

havnt thought of a solution yet, just creating this issue to track

@0xroll 0xroll added the bug Something isn't working label Sep 8, 2024
@tentabs00
Copy link

tentabs00 commented Sep 17, 2024

here are two specific transactions with a different dune token address than what solscan shows:

Transaction 1

  • transaction signature: 2tjLmmksKz636jKacMoNnGKTvUdBCSWsUtzuswGskXae8PNa9Gr1Qb2XZdRtK5ne4BR7oB8mryV2RN6t8bqiULCP
  • token contract address per solscan: MNDEFzGvMt87ueuHvVU9VcTqsAP5b3fTGPsHuuPA5ey
  • token mint address per dune: DriFtupJYLTosbwoN8koMbEYSx54aFAVLddWsbksjwg7
  • UPDATED 2024-02-28: the token mint address in dune is now showing KMNo3nJsBXfcpJTVhZcXLW7RmTwTt4GVFE7suUBo9sS which is different but still not matching the blockchain
  • dune query showing incorrect token mint address:
select *
from tokens_solana.transfers t
where t.block_date = cast('2024-07-21' as date) -- date filter to improve performance
and t.tx_id = '2tjLmmksKz636jKacMoNnGKTvUdBCSWsUtzuswGskXae8PNa9Gr1Qb2XZdRtK5ne4BR7oB8mryV2RN6t8bqiULCP' 

Transaction 2

  • transaction signature: 47JfrTa9NaPQJmzQM1sJ8JcUbsSDAycjuABV1GxT3bZNQesQSNamtEWWyKFYt81SvEqP9gkvgEyw1KNGaxHsteLk
  • token contract address per solscan: MNDEFzGvMt87ueuHvVU9VcTqsAP5b3fTGPsHuuPA5ey
  • token mint address per dune: EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v
  • dune query showing incorrect token mint address:
select *
from tokens_solana.transfers t
where t.block_date = cast('2024-04-15' as date) -- date filter to improve performance
and t.tx_id = '47JfrTa9NaPQJmzQM1sJ8JcUbsSDAycjuABV1GxT3bZNQesQSNamtEWWyKFYt81SvEqP9gkvgEyw1KNGaxHsteLk'

Query to retrieve negative balances

here is a dune query showing all negative balances for the MNDE solana token, you can remove the token filter in the first CTE to assess all tokens

-- filtering to only a set of tokens to improve performance
-- you can remove this to see the results for all tokens
with transfers_filtered as (
    select *
    from tokens_solana.transfers t
    where t.token_mint_address in (
        'MNDEFzGvMt87ueuHvVU9VcTqsAP5b3fTGPsHuuPA5ey'
    )
),
-- aggregate wallet-level transfers by counting transfers in as positive and transfers out as negative
wallet_transfers as (
    select 'solana' as chain
    ,t.block_time
    ,t.from_token_account as wallet_address
    ,-t.amount as transfer_amount
    ,token_mint_address
    from transfers_filtered t
    
    union all
    
    select 'solana' as chain
    ,t.block_time
    ,t.to_token_account as wallet_address
    ,t.amount as transfer_amount
    ,token_mint_address
    from transfers_filtered t
),

-- calculate wallet balances by summing net transfers over time for each wallet
wallet_balances as (
    select chain
    ,block_time
    ,wallet_address
    ,token_mint_address
    ,transfer_amount
    ,sum(transfer_amount) over (partition by chain,wallet_address,token_mint_address order by block_time asc) as balance
    from wallet_transfers
)

select chain
,token_mint_address
,wallet_address
,min(balance) as lowest_balance
from wallet_balances

-- filter to balances below -1 rather than 0 to filter any rounding errors
where balance < -1
group by 1,2,3
order by 1,2,4 desc

@jeff-dude
Copy link
Member

closing issue due to inactivity. if still relevant, and proposed fix is available, feel free to reopen an issue as needed.

@tentabs00
Copy link

these transactions still show the wrong mint address in dune, meaning that dune data doesn't match the actual blockchain.

we are already in the middle of switching our infrastructure to use the solana blockchain in google public datasets rather than the dune API because the incorrect dune data is breaking dbt tests that check for negative token balances.

@jeff-dude jeff-dude reopened this Feb 28, 2025
@jeff-dude
Copy link
Member

happy to reopen the issue, was just doing a mass cleanup.

it's important to note that tokens_solana.transfers is a curated dataset, meaning not the raw solana data we pull from the blockchain. this introduces various stages where the data could have been written incorrectly in the final end table referenced in this issue.

tokens_solana.transfers is a simple union of multiple types of solana transfers:
https://github.com/duneanalytics/spellbook/blob/main/dbt_subprojects/solana/models/_sector/transfers/tokens_solana_transfers.sql

we would need to identify which of these types of transfers the troubled tx's come from. once narrowed down, we can dig into the code which compiles those curated transfers datasets.

can either of you help identify the next steps to a solution?

@tentabs00
Copy link

definitely empathize with the need for mass ticket cleanup. I'll try to take a look at the underlying tables and see if I can identify the source of the issue, thanks for linking the definition.

@jeff-dude
Copy link
Member

definitely empathize with the need for mass ticket cleanup. I'll try to take a look at the underlying tables and see if I can identify the source of the issue, thanks for linking the definition.

it's quite the lineage of spells, so plz do ask questions as you navigate around it 🙏

@jeff-dude jeff-dude added the in review Assignee is currently reviewing the PR label Feb 28, 2025
@jeff-dude jeff-dude self-assigned this Feb 28, 2025
@tentabs00
Copy link

tentabs00 commented Feb 28, 2025

ok as a starting place I can confirm that the raw solana.transactions table shows the correct mint address, so the issue must lie in one of the component transformations.

query to show transaction details for example 1 above, but with the correct mint address:

WITH pre_balances AS (
    SELECT 
        t.signature,
        t.block_date,
        pre.account,
        pre.mint,
        pre.owner,
        pre.amount
    FROM solana.transactions t,
         UNNEST(pre_token_balances) AS pre(account, mint, owner, amount)
    WHERE block_date = cast('2024-07-21' as date)
    AND signature = '2tjLmmksKz636jKacMoNnGKTvUdBCSWsUtzuswGskXae8PNa9Gr1Qb2XZdRtK5ne4BR7oB8mryV2RN6t8bqiULCP'
),
post_balances AS (
    SELECT 
        t.signature,
        post.account,
        post.mint,
        post.owner,
        post.amount
    FROM solana.transactions t,
         UNNEST(post_token_balances) AS post(account, mint, owner, amount)
    WHERE block_date = cast('2024-07-21' as date)
    AND signature = '2tjLmmksKz636jKacMoNnGKTvUdBCSWsUtzuswGskXae8PNa9Gr1Qb2XZdRtK5ne4BR7oB8mryV2RN6t8bqiULCP'
)

SELECT 
    pre.block_date,
    pre.signature,
    pre.account,
    pre.mint,
    pre.owner,
    pre.amount as pre_amount,
    post.amount as post_amount,
    post.amount - pre.amount as transfer
FROM pre_balances pre
FULL OUTER JOIN post_balances post
    ON pre.signature = post.signature
    AND pre.account = post.account
    AND pre.mint = post.mint

@tentabs00
Copy link

the underlying table tokens_solana.spl_transfers_call_transfer_2024_q3 is showing the incorrect token_mint_address. I'm not sure which spell steps are between this table and the correct values in the solana.transactions table so if you can point me in the right direction I can dig in further if necessary.

select *
from tokens_solana.spl_transfers_call_transfer_2024_q3 t
WHERE block_date = cast('2024-07-21' as date)
and t.tx_id = '2tjLmmksKz636jKacMoNnGKTvUdBCSWsUtzuswGskXae8PNa9Gr1Qb2XZdRtK5ne4BR7oB8mryV2RN6t8bqiULCP' 
limit 100

@tentabs00
Copy link

tentabs00 commented Mar 3, 2025

Summary of root cause

the bug is caused by solana owner-account pairs that have transactions with multiple SPL tokens, while the logic in the solana_utils.token_accounts table only allows for a single SPL token to be associated with a pair. the solana blockchain appears to allow an owner-account pairing to be used for multiple SPL tokens as long as the prior token's total balance has been sent and is thus reset to 0.

Walkthrough of example transaction with bug

the record for the account YgiU6QrKidVFS6PhwoeTeHZXiSc8Av2UykCk7umyddo in solana_utils.token_accounts is correctly associated with the owner CS2FQShP7w3TYxEnRV6v7NFtbXpA4rbKRw9kvyNdRWRY . it is also associated with token_mint_address DriFtupJYLTosbwoN8koMbEYSx54aFAVLddWsbksjwg7 which is incorrect for the traction involving receipt of token MNDEFzGvMt87ueuHvVU9VcTqsAP5b3fTGPsHuuPA5ey.

using the dune query below, we can see that account YgiU6Q... did interact with DriFtup... multiple times between 2024-05-20 and 2024-07-16, at which point the full balance was sent to another address. on 2024-07-21, the account received MNDE... for the first time, eventually sending the full balance out on 07-22 and being reassigned back to DriFtup... when it received some after sending out its full MNDE... balance. both the solscan blockchain details and the dune db show this owner-account combination interacting with 5 different SPL tokens, including the MNDE... token relevant to the transaction and the DriF... token it is assigned to in the token_accounts table. this leads to the root cause of the incorrect address: the consolidation of all mint addresses associated with the account via the max_by() function in line 20 of the token_accounts table, which effectively removes the MNDE... token from the resulting table.

Correct treatment

rather than selecting the first token_mint_address by date (i.e. DriF... for this pair), the solana_spl_transfers_call_transfer_macro.sql should be selecting the token_mint_address associated with the owner-account pair as of 2024-07-21 when the MNDE transaction was made, which would correctly yield MNDE....

Dune query showing interactions with 5 tokens by the owner-account pair

select address,
token_balance_owner,
block_time,
token_mint_address,
pre_token_balance,
post_token_balance,
token_balance_change
from solana.account_activity
where address = 'YgiU6QrKidVFS6PhwoeTeHZXiSc8Av2UykCk7umyddo'
and tx_success = true
order by block_time

Blockchain data verifying interactions with 5 tokens by this owner-account pair

here is a link to the blockchain transaction data showing the YgiU6Q...yddo account address interacting with 5 different SPL tokens.

Proposed fix

my initial suggestion would be to modify solana_utils_token_accounts.sql to add a start_date and end_date for when the owner-account pair is connected with the token_mint_address. you could then modify the join logic in rows 81-86 in solana_spl_transfers_call_transfer_macro.sql to also join on the start and end dates, resulting in the correct mint addresses. that said I'm not aware of the full codebase and what other ETLs might be affected by either the current bug or the proposed fix.

@jeff-dude
Copy link
Member

thank you for the detailed breakdown. i have raised this as a bug in our backlog to prioritize, in case no one else proactively opens a PR to try to help resolve.

@tentabs00
Copy link

@jeff-dude can you link me to the definition of spl_token_solana.spl_token_call_transfer? I want to check if there's a way to simply include the mint address into that table which could be a simpler solution

@jeff-dude
Copy link
Member

@jeff-dude can you link me to the definition of spl_token_solana.spl_token_call_transfer? I want to check if there's a way to simply include the mint address into that table which could be a simpler solution

that's a decoded table, so a source built prior to being used in spellbook.
there's a page here, though not sure that helps much. i'm not sure what we can do in the decoding layer to add that field

@tentabs00
Copy link

if that's the case then I think the fix I proposed with modified solana_utils.token_accounts join logic would be necessary. the SQL itself wouldn't be too complex but the business logic implications feel very material.

this modified token_accounts table would shift from a one-to-one to a one-to-many relationship between owner-address pairs and token_mint_address, which is spooky both because changes would be breaking for current queries and because existing queries that use either token_accounts or tokens_solana.transfers likely have misstatements due to the bug. a non-breaking solution could be to make a new solana_utils.token_accounts_dated table that would have the same fields along with activity_start and activity_end and could be used for the transfers table ETLs, but this new table would be larger than the current table's 1.2 trillion rows. either solution would change prior values in tokens_solana.transfers which just by itself is a material change given how central that table is to all solana analysis.

if you think it makes sense for me to draft a PR based on either proposal I can try to do that this or next week, but it might make more sense to wait until a PM or someone else has time to review all this info. let me know if you think drafting a PR would help move things along quicker though.

@jeff-dude
Copy link
Member

our team likely won't have time to prioritize until at the best next week, so if you can find time to open a PR, that would help expedite for sure. we could leverage the PR as a visual / example to help review with the team internally and look for approval.

i agree that tokens_solana.transfers will change, but with the change being towards resolving a bug, we are open to that happening. the tough part is rebuilding them all in prod, as they are massive tables as you noted. we actually had to go down the path of time-based intermediate tables (quarterly) to get them to run efficiently, even though we don't love the setup. it was the quick win to get them built.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working in review Assignee is currently reviewing the PR
Projects
None yet
Development

No branches or pull requests

3 participants