Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New install hangs on helm install of fission-workflows #249

Closed
freeqaz opened this issue Mar 4, 2019 · 14 comments
Closed

New install hangs on helm install of fission-workflows #249

freeqaz opened this issue Mar 4, 2019 · 14 comments

Comments

@freeqaz
Copy link

freeqaz commented Mar 4, 2019

Summary

When installing using the instructions in the Readme, helm hangs forever and eventually fails.

Versions

Using both fission versions:

  • 1.0.0
  • 0.12.0

And using `fission-workflows) version 0.6.0

Analysis (So Far)

It looks like (from the logs), this issue is because of both jaeger-agent and workflow throwing errors. I'm running this in minikube so the environment shouldn't be the issue.

~ kubectl get pods --all-namespaces
NAMESPACE          NAME                                                              READY   STATUS              RESTARTS   AGE
default            workflows-5847fb5c44-5qn95                                        1/2     CrashLoopBackOff    10         31m
fission-builder    workflow-2348-6455c6785d-947pd                                    2/2     Running             0          31m
fission-function   workflow-3a90970f-3e61-11e9-a663-0cfed7af0aa3-3et1w75b-987mfbcd   1/2     CrashLoopBackOff    10         31m
...

jaeger-agent

Here are my startup logs:

~ kubectl logs -n default workflows-5847fb5c44-5qn95 jaeger-agent
{"level":"warn","ts":1551694209.773764,"caller":"tchannel/flags.go:72","msg":"Using deprecated configuration","option":"collector.host-port"}
{"level":"fatal","ts":1551694209.7743757,"caller":"agent/main.go:78","msg":"Could not create collector proxy","error":"could not create collector proxy, address is missing","stacktrace":"main.main.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/main.go:78\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:762\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/main.go:121\nruntime.main\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/proc.go:201"}

For jaeger-agent, it seems like this issue is being tracked upstream. jaegertracing/jaeger#1395

workflow

Not sure if this is related to jaeger but seems like it likely isn't (imo).

Logs:

~ kubectl logs -n fission-function workflow-3a90970f-3e61-11e9-a663-0cfed7af0aa3-3et1w75b-987mfbcd workflow
time="2019-03-04T10:11:26Z" level=info msg="Established gRPC connection to 'workflows.default:5555'"
time="2019-03-04T10:11:36Z" level=fatal msg="Failed to reach workflows deployment: rpc error: code = DeadlineExceeded desc = context deadline exceeded"

Hopefully these issues are temporary. These notes may help somebody else though!

@erwinvaneyk
Copy link
Member

Hey @freeqaz - thanks for raising the issue! It looks like the fission workflows proxy (the thing that makes Fission Workflows look like an environment in Fission) cannot find the deployment of Fission Workflows in the default namespace. Can you check if that deployment is running, and if not, provide the logs of that pod too?

@freeqaz
Copy link
Author

freeqaz commented Mar 4, 2019

Thanks for taking a look at this, @erwinvaneyk (and especially so quickly).

I turned off minikube when I went to bed last night and it looks like by restarting it, magically many of the containers are working now. Not exactly sure what the magic is all about there!

~ kubectl get pods --all-namespaces 
NAMESPACE          NAME                                                              READY   STATUS              RESTARTS   AGE
default            workflows-5847fb5c44-5qn95                                        2/2     Running             15         10h
fission-builder    workflow-2348-6455c6785d-947pd                                    2/2     Running             2          10h
fission-function   workflow-3a90970f-3e61-11e9-a663-0cfed7af0aa3-ohv1obkr-6bbcgv9d   2/2     Running             0          4m27s
fission            buildermgr-8cf45c4bd-qvzxb                                        1/1     Running             4          10h
fission            controller-869b5b89b-qcbjh                                        1/1     Running             3          10h
fission            executor-79c4d454d4-qz569                                         1/1     Running             4          10h
fission            fission-all-prometheus-alertmanager-5c657546c5-wvs6z              2/2     Running             4          10h
fission            fission-all-prometheus-kube-state-metrics-84bd6ddc7d-krs2v        1/1     Running             4          10h
fission            fission-all-prometheus-node-exporter-twxkw                        1/1     Running             2          10h
fission            fission-all-prometheus-pushgateway-64b9f7644f-rz75z               1/1     Running             2          10h
fission            fission-all-prometheus-server-5dff7444fd-6nwvb                    2/2     Running             4          10h
fission            influxdb-868d78d79-vjzss                                          1/1     Running             2          10h
fission            kubewatcher-f8449d568-jhkrf                                       1/1     Running             4          10h
fission            logger-q2wgz                                                      1/1     Running             3          10h
fission            mqtrigger-nats-streaming-8bc879c4c-wv7lz                          1/1     Running             7          10h
fission            nats-streaming-576964bb64-z7bxb                                   1/1     Running             2          10h
fission            redis-0                                                           1/1     Running             2          10h
fission            router-f6975d4dd-gxcmb                                            1/1     Running             4          10h
fission            storagesvc-9856b7cb9-9lsb4                                        1/1     Running             2          10h
fission            timer-79747d5678-5fjmp                                            1/1     Running             4          10h
kube-system        coredns-86c58d9df4-4ff4m                                          1/1     Running             2          10h
kube-system        coredns-86c58d9df4-b6rfl                                          1/1     Running             2          10h
kube-system        etcd-minikube                                                     1/1     Running             0          5m59s
kube-system        kube-addon-manager-minikube                                       1/1     Running             2          10h
kube-system        kube-apiserver-minikube                                           1/1     Running             0          6m16s
kube-system        kube-controller-manager-minikube                                  1/1     Running             3          10h
kube-system        kube-proxy-2rw59                                                  1/1     Running             0          5m24s
kube-system        kube-scheduler-minikube                                           1/1     Running             3          10h
kube-system        registry-creds-ndkc2                                              0/1     ContainerCreating   0          10h
kube-system        storage-provisioner                                               1/1     Running             4          10h
kube-system        tiller-deploy-6d6cc8dcb5-rssff                                    1/1     Running             2          10h

Looks like registry-creds-ndkc2 is hanging. Hasn't finished in the 10 minutes since I started minikube again.

Still seeing fission-workflows be unable to get the server version.

~ export FISSION_ROUTER=$(minikube ip):$(kubectl -n fission get svc router -o jsonpath='{...nodePort}')
~ fission-workflows version                                   
client: {"Version":"0.6.0","GitDate":"2018-10-15T16:47:17Z","BuildDate":"2018-10-15T16:47:17Z","GitCommit":"78c053231958e0709e9a668a1557968d9a7ec46b"}
server: failed to get version (response error (502 Bad Gateway): )

For the logs you have requested, let me try to get you them.

~ kubectl logs -n default workflows-5847fb5c44-5qn95 workflows
time="2019-03-04T20:14:54Z" level=info msg="Starting bundle..." config="&{Nats:<nil> Fission:0xc0006640c0 InternalRuntime:true InvocationController:true WorkflowController:true AdminAPI:true WorkflowAPI:true HTTPGateway:true InvocationAPI:true Metrics:true Debug:false FissionProxy:false}" version="{Version:0.6.0 GitDate:seconds:1539622037  BuildDate:seconds:1539622037  GitCommit:78c053231958e0709e9a668a1557968d9a7ec46b}"
time="2019-03-04T20:15:34Z" level=info msg="Using the in-memory event store"
time="2019-03-04T20:15:34Z" level=info msg="Using Task Runtime: Workflow"
time="2019-03-04T20:15:34Z" level=info msg="Using Task Runtime: Internal"
time="2019-03-04T20:15:34Z" level=info msg="Internal runtime functions: [            compose sleep fail http foreach switch noop nop while javascript if repeat]"
time="2019-03-04T20:15:34Z" level=info msg="Using Task Runtime: Fission" controller="http://controller.fission" executor="http://executor.fission" router="http://router.fission"
time="2019-03-04T20:15:34Z" level=info msg="Using controller: workflow"
time="2019-03-04T20:15:34Z" level=info msg="Using controller: invocation"
time="2019-03-04T20:15:34Z" level=info msg="Serving admin gRPC API at :5555."
time="2019-03-04T20:15:34Z" level=info msg="Serving workflow gRPC API at :5555."
time="2019-03-04T20:15:34Z" level=info msg="Serving workflow invocation gRPC API at :5555."
time="2019-03-04T20:15:34Z" level=info msg="Serving gRPC services at: [::]:5555"
time="2019-03-04T20:15:34Z" level=info msg="Registered Workflow API HTTP Endpoint"
time="2019-03-04T20:15:34Z" level=info msg="Registered Admin API HTTP Endpoint"
time="2019-03-04T20:15:34Z" level=info msg="Registered Workflow Invocation API HTTP Endpoint"
time="2019-03-04T20:15:34Z" level=info msg="Set up prometheus collector: :8080/metrics"
time="2019-03-04T20:15:34Z" level=info msg="Serving HTTP API gateway at: :8080"
time="2019-03-04T20:15:34Z" level=info msg="Setup completed."
~ kubectl logs -n default workflows-5847fb5c44-5qn95 jaeger-agent
<TRUNCATED>
<Repeats the following lines for a long time>
{"level":"info","ts":1551731296.3574235,"caller":"peerlistmgr/peer_list_mgr.go:157","msg":"Not enough connected peers","connected":0,"required":1}
{"level":"info","ts":1551731296.35754,"caller":"peerlistmgr/peer_list_mgr.go:166","msg":"Trying to connect to peer","host:port":"jaeger-collector:14267"}
{"level":"error","ts":1551731296.368513,"caller":"peerlistmgr/peer_list_mgr.go:171","msg":"Unable to connect","host:port":"jaeger-collector:14267","connCheckTimeout":0.25,"error":"dial tcp: lookup jaeger-collector on 10.96.0.10:53: no such host","stacktrace":"github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr.(*PeerListManager).ensureConnections\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr/peer_list_mgr.go:171\ngithub.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr.(*PeerListManager).maintainConnections\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr/peer_list_mgr.go:101"}
{"level":"info","ts":1551731297.3565254,"caller":"peerlistmgr/peer_list_mgr.go:157","msg":"Not enough connected peers","connected":0,"required":1}
{"level":"info","ts":1551731297.3568487,"caller":"peerlistmgr/peer_list_mgr.go:166","msg":"Trying to connect to peer","host:port":"jaeger-collector:14267"}
{"level":"error","ts":1551731297.369408,"caller":"peerlistmgr/peer_list_mgr.go:171","msg":"Unable to connect","host:port":"jaeger-collector:14267","connCheckTimeout":0.25,"error":"dial tcp: lookup jaeger-collector on 10.96.0.10:53: no such host","stacktrace":"github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr.(*PeerListManager).ensureConnections\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr/peer_list_mgr.go:171\ngithub.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr.(*PeerListManager).maintainConnections\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr/peer_list_mgr.go:101"}
{"level":"info","ts":1551731298.356542,"caller":"peerlistmgr/peer_list_mgr.go:157","msg":"Not enough connected peers","connected":0,"required":1}
{"level":"info","ts":1551731298.3571615,"caller":"peerlistmgr/peer_list_mgr.go:166","msg":"Trying to connect to peer","host:port":"jaeger-collector:14267"}
{"level":"error","ts":1551731298.3719258,"caller":"peerlistmgr/peer_list_mgr.go:171","msg":"Unable to connect","host:port":"jaeger-collector:14267","connCheckTimeout":0.25,"error":"dial tcp: lookup jaeger-collector on 10.96.0.10:53: no such host","stacktrace":"github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr.(*PeerListManager).ensureConnections\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr/peer_list_mgr.go:171\ngithub.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr.(*PeerListManager).maintainConnections\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr/peer_list_mgr.go:101"}
~ kubectl logs -n fission-builder workflow-2348-6455c6785d-947pd builder
<Empty Output>
~ kubectl logs -n fission-builder workflow-2348-6455c6785d-947pd fetcher
2019/03/04 20:14:56 Fetcher ready to receive requests
~ kubectl logs -n fission-function   workflow-3a90970f-3e61-11e9-a663-0cfed7af0aa3-ohv1obkr-6bbcgv9d workflow
time="2019-03-04T20:16:37Z" level=info msg="Established gRPC connection to 'workflows.default:5555'"
time="2019-03-04T20:16:37Z" level=info msg="Serving proxy at: :8888"
~ kubectl logs -n fission-function   workflow-3a90970f-3e61-11e9-a663-0cfed7af0aa3-ohv1obkr-6bbcgv9d fetcher 
2019/03/04 20:16:40 Fetcher ready to receive requests

That's everything I can think to share. I'm able to add a /hello endpoint from the basic fission examples (not fission workflows) and call that successfully. Fission-workflows specifically doesn't seem to work though and I'm not sure why. Thanks again for lending your eyeballs and brain to help debug this. :)

@freeqaz
Copy link
Author

freeqaz commented Mar 4, 2019

And here is the networking config output, just since it might help you (I'm pretty new at k8s and especially debugging, but I've done a good bit of distributed systems development professionally).

kubectl -n fission get svc       
NAME                                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
controller                                  NodePort    10.108.173.137   <none>        80:31313/TCP     11h
executor                                    ClusterIP   10.102.220.142   <none>        80/TCP           11h
fission-all-prometheus-alertmanager         ClusterIP   10.98.100.167    <none>        80/TCP           11h
fission-all-prometheus-kube-state-metrics   ClusterIP   None             <none>        80/TCP           11h
fission-all-prometheus-node-exporter        ClusterIP   None             <none>        9100/TCP         11h
fission-all-prometheus-pushgateway          ClusterIP   10.108.251.24    <none>        9091/TCP         11h
fission-all-prometheus-server               ClusterIP   10.111.237.15    <none>        80/TCP           11h
influxdb                                    ClusterIP   10.101.150.68    <none>        8086/TCP         11h
nats-streaming                              NodePort    10.99.119.165    <none>        4222:31316/TCP   11h
redis                                       ClusterIP   10.98.105.68     <none>        6379/TCP         11h
router                                      NodePort    10.97.58.196     <none>        80:31314/TCP     11h
storagesvc                                  ClusterIP   10.108.22.140    <none>        80/TCP
~``

@freeqaz
Copy link
Author

freeqaz commented Mar 5, 2019

Some more info that comes to mind, as well.

~ minikube version
minikube version: v0.34.1
~ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"archive", BuildDate:"2019-03-03T03:43:13Z", GoVersion:"go1.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:00:57Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
~ helm version
Client: &version.Version{SemVer:"v2.13.0", GitCommit:"79d07943b03aea2b76c12644b4b54733bc5958d6", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.13.0", GitCommit:"79d07943b03aea2b76c12644b4b54733bc5958d6", GitTreeState:"clean"}

Going to try fission version 0.7.2 next (exact copy paste from the install instructions) in a new minikube instance.

@freeqaz
Copy link
Author

freeqaz commented Mar 5, 2019

Ran those steps and the helm install step works correctly now. When I add the example fortunewhale steps, I am sitting getting an error. Logs from the router look like this.

~ kubectl logs -n fission router-74c8d9b4d6-4n89b
2019/03/05 07:15:51 Starting router at port 8888
2019/03/05 07:16:52 Calling getServiceForFunction for function: fortunewhale
2019/03/05 07:16:52 Calling getServiceForFunction for function: fortune
2019/03/05 07:16:52 http: proxy error: Internal error - Failed to read executable.
2019/03/05 07:16:58 Calling getServiceForFunction for function: fortune
2019/03/05 07:16:58 http: proxy error: Internal error - Failed to read executable.
2019/03/05 07:17:01 Tapped 1 services in batch
2019/03/05 07:17:01 Tapped 1 services in batch

For the install steps, it looks like this:

~ minikube stop
✋  Stopping "minikube" in kvm2 ...
🛑  "minikube" stopped.
~ minikube delete                   
🔥  Deleting "minikube" from kvm2 ...
💔  The "minikube" cluster has been deleted.
~ minikube start --vm-driver kvm2 --cpus=4 --memory 5000
😄  minikube v0.34.1 on linux (amd64)
🔥  Creating kvm2 VM (CPUs=4, Memory=5000MB, Disk=20000MB) ...
📶  "minikube" IP address is 192.168.39.152
🐳  Configuring Docker as the container runtime ...
✨  Preparing Kubernetes environment ...
🚜  Pulling images required by Kubernetes v1.13.3 ...
🚀  Launching Kubernetes v1.13.3 using kubeadm ... 
🔑  Configuring cluster permissions ...
🤔  Verifying component health .....
💗  kubectl is now configured to use "minikube"
🏄  Done! Thank you for using minikube!
~ helm repo add fission-charts https://fission.github.io/fission-charts/
helm repo update
"fission-charts" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "fission-charts" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈ 
~ helm install --wait -n fission-all --namespace fission --set serviceType=NodePort --set analytics=false fission-charts/fission-all --version 0.7.2
Error: could not find tiller
# running the following because otherwise tiller fails and this was my resolution from previous research
~ minikube addons enable registry-creds
✅  registry-creds was successfully enabled
~ helm init
$HELM_HOME has been configured at /home/user/.helm.

Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.

Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
Happy Helming!
~ helm install --wait -n fission-all --namespace fission --set serviceType=NodePort --set analytics=false fission-charts/fission-all --version 0.7.2
Error: release fission-all failed: timed out waiting for the condition
# this fails but I watched the kubectl logs until everything started before continuing
~ helm install --wait -n fission-workflows fission-charts/fission-workflows --version 0.6.0
NAME:   fission-workflows
LAST DEPLOYED: Mon Mar  4 23:09:05 2019
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1/Deployment
NAME       READY  UP-TO-DATE  AVAILABLE  AGE
workflows  1/1    1           1          88s

==> v1/Environment
NAME      AGE
workflow  88s

==> v1/Pod(related)
NAME                        READY  STATUS   RESTARTS  AGE
workflows-5847fb5c44-b7ksh  2/2    Running  0         88s

==> v1/Service
NAME                 TYPE       CLUSTER-IP     EXTERNAL-IP  PORT(S)          AGE
workflows            ClusterIP  10.96.170.226  <none>       80/TCP,5555/TCP  88s
workflows-apiserver  ClusterIP  10.100.229.44  <none>       80/TCP,5555/TCP  88s


NOTES:
Hooray! You can now use workflows in Fission.

Usage:
# bash
# Setup a couple of Fission functions
curl https://raw.githubusercontent.com/fission/fission-workflows/master/examples/whales/fortune.sh > fortune.sh
curl https://raw.githubusercontent.com/fission/fission-workflows/master/examples/whales/whalesay.sh > whalesay.sh

fission env create --name binary --image fission/binary-env
fission fn create --name whalesay --env binary --deploy ./whalesay.sh
fission fn create --name fortune --env binary --deploy ./fortune.sh

# Setup a workflow using the workflow environment
curl https://raw.githubusercontent.com/fission/fission-workflows/master/examples/whales/fortunewhale.wf.yaml > fortunewhale.wf.yaml

fission fn create --name fortunewhale --env workflow --src ./fortunewhale.wf.yaml

# Invoke the workflow just like any other Fission function

curl $FISSION_ROUTER/fission-function/fortunewhale

~ export FISSION_ROUTER=$(minikube ip):$(kubectl -n fission get svc router -o jsonpath='{...nodePort}')
~ curl https://raw.githubusercontent.com/fission/fission-workflows/master/examples/whales/fortune.sh > fortune.sh
curl https://raw.githubusercontent.com/fission/fission-workflows/master/examples/whales/whalesay.sh > whalesay.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   242  100   242    0     0   1105      0 --:--:-- --:--:-- --:--:--  1105
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   610  100   610    0     0   3019      0 --:--:-- --:--:-- --:--:--  3034
~ fission env create --name binary --image fission/binary-env
fission fn create --name whalesay --env binary --deploy ./whalesay.sh
fission fn create --name fortune --env binary --deploy ./fortune.sh
environment 'binary' created
Package 'whalesay-sh-bgoh' created
function 'whalesay' created
Package 'fortune-sh-jphk' created
function 'fortune' created
~ curl https://raw.githubusercontent.com/fission/fission-workflows/master/examples/whales/fortunewhale.wf.yaml > fortunewhale.wf.yaml
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   397  100   397    0     0     48      0  0:00:08  0:00:08 --:--:--    83
~ fission fn create --name fortunewhale --env workflow --src ./fortunewhale.wf.yaml
Package 'fortunewhale-wf-yaml-khlg' created
function 'fortunewhale' created
~ curl $FISSION_ROUTER/fission-function/fortunewhale
fission function error: []
~ fission route create --method GET --url /fortunewhale --function fortunewhale
trigger '5b006307-c976-42e7-8f9d-8a39d9e42c41' created
~ curl ${FISSION_ROUTER}/fortunewhale
fission function error: []

Tried restarting minikube again and still get the fission function error: [] issue. Not sure what's up.

@freeqaz
Copy link
Author

freeqaz commented Mar 5, 2019

The issue in the router logs is definitely stemming from this line in fission... But I'm not exactly sure what the codepath is that triggers this. The executable is not correctly written to disk?
https://github.com/fission/fission/blob/master/environments/binary/server.go#L94

@freeqaz
Copy link
Author

freeqaz commented Mar 5, 2019

Running these steps allowed me to get the registry-creds service running.

$minikube addons configure registry-creds
$minikube addons enable registry-creds

Source: kubernetes/minikube#1391

@erwinvaneyk
Copy link
Member

erwinvaneyk commented Mar 5, 2019

Hey @freeqaz - thanks for the truly extensive overview of what you are running into. It looks like the initial issue might have been transient: I suspect that the images were still downloading(?)

As I get it, the current issue is that you get errors running the fortunewhale example, correct?

  • do the separate functions (fortune and echowhale) work, when calling them directly?

As for the Minikube-related issues, my experience with minikube is similar; it is a bit more cumbersome to work with than cloud-based Kubernetes clusters

@freeqaz
Copy link
Author

freeqaz commented Mar 5, 2019

I threw together a script to piece together the various pieces of this install -- including adding in some sleeps to account for the delays across components. Same output (just to be extra thorough).

#!/bin/bash
minikube start --vm-driver kvm2 --cpus=4 --memory 5000

minikube addons configure registry-creds
minikube addons enable registry-creds

helm init

helm repo add fission-charts https://fission.github.io/fission-charts/
helm repo update

sleep 60

helm install --wait -n fission-all --namespace fission --set serviceType=NodePort --set analytics=false fission-charts/fission-all --version 0.7.2

sleep 120

helm install --wait -n fission-workflows fission-charts/fission-workflows --version 0.6.0

sleep 120

export FISSION_ROUTER=$(minikube ip):$(kubectl -n fission get svc router -o jsonpath='{...nodePort}')

# Installation instructions
#fission env create --name binary --image fission/binary-env
#fission fn create --name whalesay --env binary --deploy ./whalesay.sh
#fission fn create --name fortune --env binary --deploy ./fortune.sh

# Readme instructions
fission env create --name binary --image fission/binary-env
fission function create --name whalesay --env binary --deploy examples/whales/whalesay.sh
fission function create --name fortune --env binary --deploy examples/whales/fortune.sh
fission function create --name fortunewhale --env workflow --src examples/whales/fortunewhale.wf.yaml
fission route create --method GET --url /fortunewhale --function fortunewhale
sleep 5
curl $FISSION_ROUTER/fortunewhale

As you suspected, the core of the issue seems to be that the functions themselves cannot be called. Here are the logs from that.

~ fission function whalesay
Fatal error: No help topic for 'whalesay'
~ fission function test --name whalesay
Error calling function whalesay: 502; Please try again or fix the error: Using fetched code path: /userfunc/user
Using internal code path: /bin/userfunc
Listening on 8888 ...

~ fission function test --name fortune 
Error calling function fortune: 502; Please try again or fix the error: Using fetched code path: /userfunc/user
Using internal code path: /bin/userfunc
Listening on 8888 ...

~ fission function test --name fortunewhale
Error calling function fortunewhale: 500; Please try again or fix the error: fission function error: []
Fatal error: Error querying logs: Post http://127.0.0.1:36319/proxy/influxdb?db=fissionFunctionLog&params=%7B%22funcuid%22%3A%225a7d1b21-3f1e-11e9-bc7a-ec2a1f6e3e02%22%2C%22time%22%3A0%7D&q=select+%2A+from+%22log%22+where+%22funcuid%22+%3D+%24funcuid+AND+%22time%22+%3E+%24time+LIMIT+1000: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
~ fission env create --name nodejs --image fission/node-env:1.0.0
environment 'nodejs' created
~ curl -LO https://raw.githubusercontent.com/fission/fission/master/examples/nodejs/hello.js
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   119  100   119    0     0    149      0 --:--:-- --:--:-- --:--:--   149
~ fission function create --name hello --env nodejs --code hello.js
Package 'hello-js-bne3' created
function 'hello' created
~ fission function test --name hello                                                        
hello, world!

@freeqaz
Copy link
Author

freeqaz commented Mar 5, 2019

Hmm... Further dumping the logs, I'm really not sure.

~ fission fn logs --name hello
[2019-03-05 08:30:38.502691707 +0000 UTC] 2019/03/05 08:30:38 fetcher received fetch request and started downloading: {1 {hello-js-bne3  default    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] nil [] }   user [] []}
[2019-03-05 08:30:38.803463809 +0000 UTC] 2019/03/05 08:30:38 Successfully placed at /userfunc/user
[2019-03-05 08:30:38.803523459 +0000 UTC] 2019/03/05 08:30:38 Checking secrets/cfgmaps
[2019-03-05 08:30:38.803541648 +0000 UTC] 2019/03/05 08:30:38 Completed fetch request
[2019-03-05 08:30:38.803953498 +0000 UTC] 2019/03/05 08:30:38 elapsed time in fetch request = 337.553324ms
[2019-03-05 08:30:38.893893543 +0000 UTC] user code loaded in 0sec 1.32491ms
[2019-03-05 08:30:38.910765642 +0000 UTC] ::ffff:172.17.0.10 - - [05/Mar/2019:08:30:38 +0000] "POST /specialize HTTP/1.1" 202 - "-" "Go-http-client/1.1"
[2019-03-05 08:30:38.93509294 +0000 UTC] ::ffff:172.17.0.15 - - [05/Mar/2019:08:30:38 +0000] "GET / HTTP/1.1" 200 14 "-" "Go-http-client/1.1"
[2019-03-05 08:30:43.872872156 +0000 UTC] ::ffff:172.17.0.15 - - [05/Mar/2019:08:30:43 +0000] "GET / HTTP/1.1" 200 14 "-" "Go-http-client/1.1"
~ fission fn logs --name whalesay
[2019-03-05 08:28:46.877923291 +0000 UTC] 2019/03/05 08:28:46 fetcher received fetch request and started downloading: {1 {whalesay-sh-cqqb  default    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] nil [] }   user [] []}
[2019-03-05 08:28:47.102908067 +0000 UTC] 2019/03/05 08:28:47 Successfully placed at /userfunc/user
[2019-03-05 08:28:47.102998262 +0000 UTC] 2019/03/05 08:28:47 Checking secrets/cfgmaps
[2019-03-05 08:28:47.103082487 +0000 UTC] 2019/03/05 08:28:47 Completed fetch request
[2019-03-05 08:28:47.103102559 +0000 UTC] 2019/03/05 08:28:47 elapsed time in fetch request = 225.737203ms
~ fission fn logs --name fortune 
[2019-03-05 08:12:29.102268666 +0000 UTC] 2019/03/05 08:12:29 Fetcher ready to receive requests
[2019-03-05 08:12:29.702641391 +0000 UTC] 2019/03/05 08:12:29 fetcher received fetch request and started downloading: {1 {fortune-sh-oglb  default    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] nil [] }   user [] []}
[2019-03-05 08:12:29.912282585 +0000 UTC] 2019/03/05 08:12:29 Successfully placed at /userfunc/user
[2019-03-05 08:12:29.912347546 +0000 UTC] 2019/03/05 08:12:29 Checking secrets/cfgmaps
[2019-03-05 08:12:29.91236786 +0000 UTC] 2019/03/05 08:12:29 Completed fetch request
[2019-03-05 08:12:29.912383891 +0000 UTC] 2019/03/05 08:12:29 elapsed time in fetch request = 268.44322ms
[2019-03-05 08:23:30.140593967 +0000 UTC] 2019/03/05 08:23:30 Received SIGTERM : Dumping stack trace
[2019-03-05 08:23:30.14067836 +0000 UTC] goroutine 20 [running]:
[2019-03-05 08:23:30.140695884 +0000 UTC] runtime/debug.Stack(0x1f75dc0, 0xc420054f90, 0x50716c)
[2019-03-05 08:23:30.140709169 +0000 UTC] 	/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
[2019-03-05 08:23:30.140722408 +0000 UTC] runtime/debug.PrintStack()
[2019-03-05 08:23:30.140734896 +0000 UTC] 	/usr/local/go/src/runtime/debug/stack.go:16 +0x22
[2019-03-05 08:23:30.140747408 +0000 UTC] main.dumpStackTrace()
[2019-03-05 08:23:30.140759866 +0000 UTC] 	src/github.com/fission/fission/environments/fetcher/cmd/main.go:24 +0x20
[2019-03-05 08:23:30.140772498 +0000 UTC] main.main.func1(0xc4200a1500)
[2019-03-05 08:23:30.140784625 +0000 UTC] 	src/github.com/fission/fission/environments/fetcher/cmd/main.go:35 +0x7d
[2019-03-05 08:23:30.140797066 +0000 UTC] created by main.main
[2019-03-05 08:23:30.140809105 +0000 UTC] 	src/github.com/fission/fission/environments/fetcher/cmd/main.go:32 +0xcb
[2019-03-05 08:29:06.057111405 +0000 UTC] 2019/03/05 08:29:06 fetcher received fetch request and started downloading: {1 {fortune-sh-oglb  default    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] nil [] }   user [] []}
[2019-03-05 08:29:06.302904812 +0000 UTC] 2019/03/05 08:29:06 Successfully placed at /userfunc/user
[2019-03-05 08:29:06.302968202 +0000 UTC] 2019/03/05 08:29:06 Checking secrets/cfgmaps
[2019-03-05 08:29:06.302984489 +0000 UTC] 2019/03/05 08:29:06 Completed fetch request
[2019-03-05 08:29:06.30299752 +0000 UTC] 2019/03/05 08:29:06 elapsed time in fetch request = 247.370258ms
[2019-03-05 08:29:13.402366727 +0000 UTC] 2019/03/05 08:29:13 fetcher received fetch request and started downloading: {1 {fortune-sh-oglb  default    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] nil [] }   user [] []}
[2019-03-05 08:29:13.704405161 +0000 UTC] 2019/03/05 08:29:13 Successfully placed at /userfunc/user
[2019-03-05 08:29:13.704483005 +0000 UTC] 2019/03/05 08:29:13 Checking secrets/cfgmaps
[2019-03-05 08:29:13.704498816 +0000 UTC] 2019/03/05 08:29:13 Completed fetch request
[2019-03-05 08:29:13.704512016 +0000 UTC] 2019/03/05 08:29:13 elapsed time in fetch request = 301.640787ms

@freeqaz
Copy link
Author

freeqaz commented Mar 5, 2019

Hooray! I figured it out. It looks like the core of the issue is using the newer fission cli tool to manage an older instance of fission. Specifically, using fission >=1.0.0 against fission <1.0.0 seems to create... Weirdness. Not exactly sure what the bug is, but seems to be pretty annoying to debug.

To test this, I downloaded the fission cli version 0.7.2 here: https://github.com/fission/fission/releases/tag/0.7.2

And then I chmod +x fission-cli-linux (per usual).
After, I removed + readded the fortune and whalesay functions like so:

~ ./fission-cli-linux fn delete --name whalesay
function 'whalesay' deleted
~ ./fission-cli-linux fn delete --name fortune 
function 'fortune' deleted
~ ./fission-cli-linux fn create --name whalesay --env binary --deploy ./whalesay.sh
function 'whalesay' created
~ ./fission-cli-linux fn create --name fortune --env binary --deploy ./fortune.sh  
function 'fortune' created
~ ./fission-cli-linux fn test --name fortune                                       
The brain is a wonderful organ; it starts working the moment you get up
in the morning, and does not stop until you get to school.
~ ./fission-cli-linux fn test --name whalesay
 _ 
<   >
 - 
    \
     \
      \
                    ##         .
              ## ## ##        ==
           ## ## ## ## ##    ===
       /"""""""""""""""""\___/ ===
      {                       /  ===-
       \______ O           __/
         \    \         __/
          \____\_______/

And then I tried just hitting the endpoint in place (Without modification). And it worked!

~ curl http://$FISSION_ROUTER/fortunewhale
 _______________________________________ 
/ Different all twisty a of in maze are \
\ you, passages little.                 /
 --------------------------------------- 
    \
     \
      \
                    ##         .
              ## ## ##        ==
           ## ## ## ## ##    ===
       /"""""""""""""""""\___/ ===
      {                       /  ===-
       \______ O           __/
         \    \         __/
          \____\_______/

Yay! I'm glad to have this resolved.

Do you think we should open an issue upstream?

@freeqaz freeqaz closed this as completed Mar 5, 2019
@freeqaz
Copy link
Author

freeqaz commented Mar 5, 2019

Going to test if this all works with 1.0.0 again, just to be thorough.

@erwinvaneyk
Copy link
Member

Do you think we should open an issue upstream?

Yes, I think so, it is at least good to have a record of this for other users to reference. Though I expect that the solution to that issue will be to add a warning (or even an error) when the client and server have mismatching versions (similar to how Helm does it).

@freeqaz
Copy link
Author

freeqaz commented Mar 7, 2019

Opened #250 to follow up on this and be more specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants