-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[error] ** System NOT running to use fully qualified hostnames ** Kubernetes DNSSRV Strategy #121
Comments
Hi @paltaa - I'm the author of the Kubernetes DNSSRV Strategy - and I've just discovered the same problem in some training material I'm supposed to be teaching tomorrow 😂!!!! This is highly annoying - looks like an arbitrary change in statefulset DNS names - I'll investigate - if only for my own selfish reasons and report back. Thanks for the bug report |
@bitwalker - assign this to me if you like |
Reason for breakage: Google Cloud have stopped using CoreDNS as DNS resolver as of GKE 1.1.3 - they are instead using kube-dns which doesn't provide service discovery via SRV record resolution. (obviously they want everyone using k8s API for everything). Lame. Release notes (1.1.3) - they switched : Someone trying to debug the issue : Stackoverflow thread : The kubernetes/docker ecosystem is just like that - stuff arbitrarily breaks all the time - recommend you try using Paul's strategy/kubernetes instead . I'm away but will set a reminder to create a docs PR - maybe changing the name to strategy/k8s-coredns-srv or something that makes it clear you need coredns. Man - that Google - always causing problems ! Sorry ! |
Thanks a lot for the reply!! Been trying to make this work for a couple of days, followed about 4 different tutorials hahaha, glad to know I helped by posting this issue! let me know if you manage to fix this, good luck tomorrow |
@bryanhuntesl Also, this is currently happening on AWS EKS, which still uses coreDNS, by making some tweaks the illegal hostname error and system not running fqdns too, but still it wont connect |
I have the same problem. My environment is aks (azure kube) v1.15.7. |
@mrchypark Hey, what worked for me is Elixir.Cluster.Strategy.Kubernetes My example: |
@paltaa Thank you for your reply! What is service means? It's service resource on kubernetes? |
Yes, the service for kubernetes deployments |
@paltaa Thank you! I'll try this :) |
@paltaa I have more question T.T what is your node name now? mine is like I have no error but empty node list too. |
You need to use distillery, setup a pre hook before the erlang VM is up and setup the ENV VAR ERLANG_NODE with its local ip |
local ip means pod ip? |
For example. Deployment part:
Then the pre hook: #!/bin/sh echo 'Setting ERLANG_NAME...' vm.args: -name excalibur@${ERLANG_NAME} -setcookie ${ERLANG_COOKIE} |
I maybe pass the set cookie. I'll try this. |
And yes, by local ip I ment pod IP |
I recommend using KubernetesDNS but it requires a headless service. |
@bryanhuntesl I've assigned this to you as requested, but my question to everyone participating in this thread is whether or not the KubernetesDNS strategy is suitable as a replacement, allowing us to deprecate the DNSSRV strategy if there are issues that make it difficult to use. I'm fine with not deprecating it, but someone from the community has to speak up and take the lead on updating the strategy as appropriate so that it works out of the box. I'm also fine with deprecating it here in libcluster, but handing off the implementation to someone to maintain on their own as a third-party plugin, just let me know. Suffice to say, I won't have time to maintain it myself in the immediate future, and I don't like to keep things around that are broken either, so I'll have to make the call soon. |
Just to clarify, is this only an issue where |
This is an issue because Google Kubernetes Engine removed CoreDNS - the
|
@bryanhuntesl Right, so when I implemented the first k8 service discovery strategy ( Which makes it extra scary when it's documented under K8 but is implementation specific! However I wonder if this is related kubernetes/dns#339 (comment) Which means it's an issue when using hostnames to pods, and not the actual address. I don't have access to GKE/kube-dns, could you test to see if That's an alternative (and one I use) unless IPs aren't good enough or you're using shared hostnames on the pods, which isn't necessary if your intention is to just setup an Erlang cluster. @bitwalker I don't suggest removing it, but renaming it to something like But in general, if |
@seivan sorry I just don't have bandwidth right now, I'm assigned to client work.
@bitwalker if renaming to Cluster.Strategy.CoreDNSSRV is acceptable - I can create a PR in the evening and update the documentation. |
@bryanhuntesl No worries, thanks for all the input. I recall testing locally with Minkube a year ago or so and that in turn runs Besides if it didn't work with |
@bryanhuntesl Sorry for the delay, haven't had a chance to get back to this in a while. I'm good with renaming the strategy, and documenting the caveats. I'll have to bump the major version for the release, but that's fine, we're probably due for that. |
I am getting this same error using this strategy. I have tried both :ip and :dns modes
|
I'm getting similar errors as @michaelst
Elixir version: 1.11.1
Issue resolvedFound out that I can get the templates of env.sh.eex and others with |
I am facing a similar problem.
kind version 0.7.0 Headless Service:
Deployment:
My topology:
Dockerfile:
I am already losing hope that I can resolve this. Does anyone know what it could be? |
@adriano
RUN echo "-name massa_proxy@${PROXY_POD_IP}" >>
/app/massa_proxy/apps/massa_proxy/rel/vm.args.eex \
have you tried ?
RUN echo "-sname massa_proxy@${PROXY_POD_IP}" >>
/app/massa_proxy/apps/massa_proxy/rel/vm.args.eex \
An IP address is not a 'name' as far as Erlang is concerned, a 'name' is a
FQDN (fully qualified domain name) such as
node-0.<service>.<namespace>.cluster.local.
…On Fri, 2 Apr 2021 at 03:09, Adriano Santos ***@***.***> wrote:
I am facing a similar problem.
I'm using Kind to test my service and I get the error below
2021-04-02 01:52:34.708 ***@***.***:[pid=<0.2378.0> ]:[error]:** System NOT running to use fully qualified hostnames **
** Hostname 10.244.0.27 is illegal **
kind version 0.7.0
Headless Service:
apiVersion: v1
kind: Service
metadata:
name: proxy-headless-svc
namespace: default
spec:
selector:
app: massa-proxy
clusterIP: None
Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: massa-proxy
name: massa-proxy
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: massa-proxy
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "9001"
prometheus.io/scrape: "true"
labels:
app: massa-proxy
spec:
containers:
- name: massa-proxy
image: docker.io/eigr/massa-proxy:0.1.0
ports:
- containerPort: 9001
imagePullPolicy: Always
env:
- name: PROXY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 9001
scheme: HTTP
initialDelaySeconds: 300
periodSeconds: 3600
successThreshold: 1
timeoutSeconds: 1200
resources:
limits:
memory: 1024Mi
requests:
memory: 70Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
envFrom:
- configMapRef:
name: proxy-cm
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
My topology:
[proxy: [strategy: Cluster.Strategy.Kubernetes.DNS, config: [service: "proxy-headless-svc", application_name: "massa-proxy", polling_interval: 3000]]]
Dockerfile:
FROM elixir:1.10-alpine as builder
ENV MIX_ENV=prod
RUN mkdir -p /app/massa_proxy
WORKDIR /app/massa_proxy
RUN apk add --no-cache --update git build-base zstd
COPY . /app/massa_proxy
RUN rm -rf /app/massa_proxy/apps/massa_proxy/mix.exs \
&& mv /app/massa_proxy/apps/massa_proxy/mix-bakeware.exs \
/app/massa_proxy/apps/massa_proxy/mix.exs
RUN mix local.rebar --force \
&& mix local.hex --force \
&& mix deps.get
RUN echo "-name massa_proxy@${PROXY_POD_IP}" >> /app/massa_proxy/apps/massa_proxy/rel/vm.args.eex \
&& echo "-setcookie ${NODE_COOKIE}" >> /app/massa_proxy/apps/massa_proxy/rel/vm.args.eex
RUN rm -fr /app/massa_proxy/_build \
&& cd /app/massa_proxy/apps/massa_proxy \
&& mix deps.get \
&& mix release.init \
&& mix release
# ---- Application Stage ----
FROM alpine:3
RUN apk add --no-cache --update bash openssl
WORKDIR /home/app
COPY --from=builder /app/massa_proxy/_build/prod/rel/bakeware/ .
COPY apps/massa_proxy/priv /home/app/
RUN adduser app --disabled-password --home app
RUN mkdir -p /home/app/cache
RUN chown -R app: .
USER app
ENV MIX_ENV=prod
ENV REPLACE_OS_VARS=true
ENV BAKEWARE_CACHE=/home/app/cache
ENV PROXY_TEMPLATES_PATH=/home/app/templates
ENTRYPOINT ["./massa_proxy"]
I am already losing hope that I can resolve this. Does anyone know what it
could be?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#121 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHUCR5SFBBQ5D3AKAJCAGSLTGURMFANCNFSM4LALPKAA>
.
--
...............................................
(PGP) 0x87E3B94D7B2BEEEF
(Keybase) ***@***.***
(Github) bryanhuntesl
...............................................
--
*
*
*Our
upcoming conferences: *
Code BEAM V Europe:
<https://www2.codesync.global/code-sync/code-beam-sto-2021> 19-21
May 2021
ElixirConf EU: <https://www2.elixirconf.eu/elixir-conf-2021/es> 8-10
September
2021
Code Beam SF:
<https://www2.codesync.global/code-beam-sf-2021/es> 4-5 November
2021
Erlang Solutions cares about your data and privacy;
please find all
details about the basis for communicating with you and the way we
process
your data in our Privacy Policy
<https://www.erlang-solutions.com/privacy-policy.html>. You can
update your
email preferences or opt-out from receiving Marketing emails here
<https://www2.erlang-solutions.com/email-preference?epc_hash=JtO6C7Q2rJwCdZxBx3Ad8jI2D4TJum7XcUWcgfjZ8YY>.
|
Hi @bryanhuntesl, thanks for the quick response.
In this test I used: |
It seems to me that the strategy is managing to resolve the addresses correctly, however, the names of the nodes are like massa-proxy@hostname instead of massa-proxy@ip and this seems to me to be the cause of the problem.
|
Resolved with: Change env.sh.eex
Thanks |
Hi, I was also struggling a lot getting libcluster to work with the DNS SRV topology. After a few hours of toying, I've finally managed to get it to work. I'll share my manifests and config for future readers. For context, Firstly, we need a headless service. This service must have # service.json
{
"metadata": {
"name": "rasmus-headless",
"namespace": "rasmus",
"labels": {
"app.kubernetes.io/name": "rasmus",
"app.kubernetes.io/instance": "rasmus",
"app.kubernetes.io/version": "0.2.13",
"app.kubernetes.io/component": "rasmus"
}
},
"spec": {
"ports": [
{
"port": 1
}
],
"clusterIP": "None",
"selector": {
"app.kubernetes.io/name": "rasmus",
"app.kubernetes.io/instance": "rasmus",
"app.kubernetes.io/component": "rasmus"
}
},
"kind": "Service",
"apiVersion": "v1"
} Secondly, you'll need to ensure you're providing an environment variable which exposes your pod's IP address. You'll also need to ensure to provide a cookie, otherwise the nodes will not be able to communicate. Example: "env": [
{
"name": "POD_IP",
"valueFrom": {
"fieldRef": {
"fieldPath": "status.podIP"
}
}
},
{
"name": "RELEASE_COOKIE",
"value": "0123456789abcdef"
}
] If you haven't already, run export RELEASE_DISTRIBUTION=name
export RELEASE_NODE="<%= @release.name %>@$(echo "$POD_IP" | sed 's/\./-/g').rasmus-headless.rasmus.svc.cluster.local" Either replace the domain name Lastly, ensure the topology is correctly configured. For example, import Config
config :libcluster,
topologies: [
_: [
strategy: Elixir.Cluster.Strategy.Kubernetes.DNSSRV,
config: [
namespace: "rasmus",
service: "rasmus-headless",
application_name: "rasmus",
polling_interval: 10_000
]
]
] At this point, the application should be in a position where the nodes can form a cluster.
For a detailed example, please refer to: |
So I've followed this tutorial:
https://tech.xing.com/creating-an-erlang-elixir-cluster-on-kubernetes-d53ef89758f6
On the logs getting this errors:
14:02:03.126 [error] ** System NOT running to use fully qualified hostnames ** ** Hostname 192-168-4-107.ueuropea-excalibur-headless-service.default.svc.cluster.local is illegal ** 14:02:03.325 [warn] [libcluster:k8s_excalibur] unable to connect to :"excalibur@192-168-4-107.ueuropea-excalibur-headless-service.default.svc.cluster.local"
`root@ueuropea-excalibur-74df5dddbc-kjfql:/excalibur# nslookup -q=srv ueuropea-excalibur-headless-service.default.svc.cluster.local
Server: 10.100.0.10
Address: 10.100.0.10#53
Non-authoritative answer:
ueuropea-excalibur-headless-service.default.svc.cluster.local service = 0 33 4000 192-168-4-107.ueuropea-excalibur-headless-service.default.svc.cluster.local.
ueuropea-excalibur-headless-service.default.svc.cluster.local service = 0 33 4000 192-168-53-92.ueuropea-excalibur-headless-service.default.svc.cluster.local.
ueuropea-excalibur-headless-service.default.svc.cluster.local service = 0 33 4000 192-168-65-161.ueuropea-excalibur-headless-service.default.svc.cluster.local.
Authoritative answers can be found from:
192-168-65-161.ueuropea-excalibur-headless-service.default.svc.cluster.local internet address = 192.168.65.161
192-168-53-92.ueuropea-excalibur-headless-service.default.svc.cluster.local internet address = 192.168.53.92
192-168-4-107.ueuropea-excalibur-headless-service.default.svc.cluster.local internet address = 192.168.4.107
`
It seems its missing a . at the end of the returned DNS right?
Is there something wrong with the tutorial? something could be missing?
Entrypoint:
`mix release
#elixir -S mix phx.server --name excalibur@${MY_POD_IP} --cookie "secret"
_build/prod-kubernetes/rel/excalibur/bin/excalibur start`
Deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
io.kompose.service: excalibur
name: ueuropea-excalibur
spec:
progressDeadlineSeconds: 90
replicas: 3
strategy:
type: Recreate
template:
metadata:
labels:
io.kompose.service: excalibur
spec:
containers:
- image: 975847796244.dkr.ecr.us-west-2.amazonaws.com/excalibur:dnsrv
name: excalibur
imagePullPolicy: Always
ports:
- containerPort: 4000
- containerPort: 4369
env:
- name: DATABASE_URL
value: *****
- name: MIX_ENV
value: prod-kubernetes
- name: SECRET_KEY_BASE
value: *****
- name: ERLANG_COOKIE
value: ******
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: ERLANG_CLUSTER_SERVICE_NAME
value: ueuropea-excalibur-headless-service
resources: {}
restartPolicy: Always
status: {}
Headless service:
apiVersion: v1
kind: Service
metadata:
labels:
io.kompose.service: excalibur
name: ueuropea-excalibur-headless-service
spec:
type: ClusterIP
clusterIP: None
ports:
- name: "http"
port: 80
targetPort: 4000
publishNotReadyAddresses: true
selector:
io.kompose.service: excalibur
Let me know if any more information is needed
The text was updated successfully, but these errors were encountered: