Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

br error prompt not accurate when running br backup and connect to PD failed #57447

Open
fubinzh opened this issue Nov 18, 2024 · 2 comments
Open
Labels
component/br This issue is related to BR of TiDB. severity/moderate type/bug The issue is confirmed as a bug.

Comments

@fubinzh
Copy link

fubinzh commented Nov 18, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. running br backup full command, and somehow it failed to connect to PD

2. What did you expect to see? (Required)

  1. backup should fail, and error prompt should be accurate

3. What did you see instead (Required)

  1. error information not accurate, it complains "running BR in incompatible version of cluster", actually br and tidb cluster is same version, and rerun the backup can be successful.
[2024/11/18 03:13:35.376 +00:00] [INFO] [meminfo.go:179] ["use cgroup memory hook because TiDB is in the container"]
[2024/11/18 03:13:35.376 +00:00] [INFO] [info.go:52] ["Welcome to Backup & Restore (BR)"] [release-version=v8.1.1] [git-hash=a7df4f9845d5d6a590c5d45dad0dcc9f21aa8765] [git-branch=HEAD] [go-version=go1.21.13] [utc-build-time="2024-08-22 05:51:39"] [race-enabled=false]
[2024/11/18 03:13:35.376 +00:00] [INFO] [common.go:755] [arguments] [__command="br backup full"] [checksum-concurrency=64] [concurrency=128] [ignore-stats=false] [pd="[http://downstream-pd:2379]"] [storage=s3://udsv2/fullback_8.1_with_stats]
[2024/11/18 03:13:35.378 +00:00] [INFO] [conn.go:159] ["new mgr"] [pdAddrs="[downstream-pd:2379]"]
[2024/11/18 03:13:35.381 +00:00] [INFO] [pd_service_discovery.go:991] ["[pd] update member urls"] [old-urls="[http://downstream-pd:2379]"] [new-urls="[http://downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379,http://downstream-pd-1.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379,http://downstream-pd-2.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379]"]
[2024/11/18 03:13:35.381 +00:00] [INFO] [pd_service_discovery.go:1016] ["[pd] switch leader"] [new-leader=http://downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379] [old-leader=]
[2024/11/18 03:13:35.381 +00:00] [INFO] [pd_service_discovery.go:498] ["[pd] init cluster id"] [cluster-id=7437383287030019589]
[2024/11/18 03:13:36.382 +00:00] [WARN] [pd_service_discovery.go:509] ["[pd] failed to check service mode and will check later"] [error="[PD:client:ErrClientGetClusterInfo]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp: lookup downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc: i/o timeout\" target:downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp: lookup downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc: i/o timeout\" target:downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379 status:TRANSIENT_FAILURE"]
[2024/11/18 03:13:36.384 +00:00] [INFO] [collector.go:224] ["units canceled"] [cancel-unit=0]
[2024/11/18 03:13:36.384 +00:00] [INFO] [collector.go:78] ["Full Backup failed summary"] [total-ranges=0] [ranges-succeed=0] [ranges-failed=0]
[2024/11/18 03:13:36.384 +00:00] [WARN] [resource_manager_client.go:302] ["[resource_manager] get token stream error"] [error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp: lookup downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc: i/o timeout\""]
[2024/11/18 03:13:36.384 +00:00] [INFO] [resource_manager_client.go:290] ["[resource manager] exit resource token dispatcher"]
[2024/11/18 03:13:36.384 +00:00] [INFO] [pd_service_discovery.go:910] ["[pd] cannot update member from this url"] [url=http://downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Canceled desc = context canceled target:downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379 status:TRANSIENT_FAILURE: error:rpc error: code = Canceled desc = context canceled target:downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379 status:TRANSIENT_FAILURE"]
[2024/11/18 03:13:36.384 +00:00] [ERROR] [backup.go:57] ["failed to backup"] [error="running BR in incompatible version of cluster, if you believe it's OK, use --check-requirements=false to skip.: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp: lookup downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc: i/o timeout\""] [errorVerbose="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp: lookup downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc: i/o timeout\"\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/root/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:1580\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/root/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:1202\ngithub.com/pingcap/tidb/br/pkg/version.CheckClusterVersion\n\t/workspace/source/tidb/br/pkg/version/version.go:89\ngithub.com/pingcap/tidb/br/pkg/conn.NewMgr\n\t/workspace/source/tidb/br/pkg/conn/conn.go:176\ngithub.com/pingcap/tidb/br/pkg/task.NewMgr\n\t/workspace/source/tidb/br/pkg/task/common.go:651\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/workspace/source/tidb/br/pkg/task/backup.go:406\nmain.runBackupCommand\n\t/workspace/source/tidb/br/cmd/br/backup.go:56\nmain.newFullBackupCommand.func1\n\t/workspace/source/tidb/br/cmd/br/backup.go:148\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:1039\nmain.main\n\t/workspace/source/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650\nrunning BR in incompatible version of cluster, if you believe it's OK, use --check-requirements=false to skip."] [stack="main.runBackupCommand\n\t/workspace/source/tidb/br/cmd/br/backup.go:57\nmain.newFullBackupCommand.func1\n\t/workspace/source/tidb/br/cmd/br/backup.go:148\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:1039\nmain.main\n\t/workspace/source/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"]
[2024/11/18 03:13:36.384 +00:00] [ERROR] [pd_service_discovery.go:559] ["[pd] failed to update member"] [urls="[http://downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379,http://downstream-pd-1.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379,http://downstream-pd-2.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc:2379]"] [error="context canceled"] [errorVerbose="context canceled\ngithub.com/pingcap/errors.AddStack\n\t/root/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:178\ngithub.com/pingcap/errors.Trace\n\t/root/go/pkg/mod/github.com/pingcap/[email protected]/juju_adaptor.go:15\ngithub.com/tikv/pd/client/retry.(*Backoffer).Exec\n\t/root/go/pkg/mod/github.com/tikv/pd/[email protected]/retry/backoff.go:94\ngithub.com/tikv/pd/client.(*pdServiceDiscovery).updateMemberLoop\n\t/root/go/pkg/mod/github.com/tikv/pd/[email protected]/pd_service_discovery.go:558\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"] [stack="github.com/tikv/pd/client.(*pdServiceDiscovery).updateMemberLoop\n\t/root/go/pkg/mod/github.com/tikv/pd/[email protected]/pd_service_discovery.go:559"]
[2024/11/18 03:13:36.384 +00:00] [INFO] [pd_service_discovery.go:550] ["[pd] exit member loop due to context canceled"]
[2024/11/18 03:13:36.384 +00:00] [ERROR] [main.go:38] ["br failed"] [error="running BR in incompatible version of cluster, if you believe it's OK, use --check-requirements=false to skip.: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp: lookup downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc: i/o timeout\""] [errorVerbose="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp: lookup downstream-pd-0.downstream-pd-peer.uds-cdc-br-scenario-restore-tps-7683462-1-596.svc: i/o timeout\"\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/root/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:1580\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/root/go/pkg/mod/github.com/tikv/pd/[email protected]/client.go:1202\ngithub.com/pingcap/tidb/br/pkg/version.CheckClusterVersion\n\t/workspace/source/tidb/br/pkg/version/version.go:89\ngithub.com/pingcap/tidb/br/pkg/conn.NewMgr\n\t/workspace/source/tidb/br/pkg/conn/conn.go:176\ngithub.com/pingcap/tidb/br/pkg/task.NewMgr\n\t/workspace/source/tidb/br/pkg/task/common.go:651\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/workspace/source/tidb/br/pkg/task/backup.go:406\nmain.runBackupCommand\n\t/workspace/source/tidb/br/cmd/br/backup.go:56\nmain.newFullBackupCommand.func1\n\t/workspace/source/tidb/br/cmd/br/backup.go:148\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:1039\nmain.main\n\t/workspace/source/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650\nrunning BR in incompatible version of cluster, if you believe it's OK, use --check-requirements=false to skip."] [stack="main.main\n\t/workspace/source/tidb/br/cmd/br/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"]

4. What is your TiDB version? (Required)

[release-version=v8.1.1]
[git-hash=a7df4f9845d5d6a590c5d45dad0dcc9f21aa8765]

@fubinzh fubinzh added the type/bug The issue is confirmed as a bug. label Nov 18, 2024
@fubinzh
Copy link
Author

fubinzh commented Nov 18, 2024

/component br

@ti-chi-bot ti-chi-bot bot added the component/br This issue is related to BR of TiDB. label Nov 18, 2024
@fubinzh
Copy link
Author

fubinzh commented Nov 18, 2024

/severity moderate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/br This issue is related to BR of TiDB. severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

1 participant