Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add antctl get fqdncache #6868

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Conversation

Dhruv-J
Copy link
Contributor

@Dhruv-J Dhruv-J commented Dec 16, 2024

This PR adds functionality for a new antctl command: antctl get fqdncache This command fetches the DNS mapping entries for FQDN policies by reading the cache for DNS entries and outputting FQDN name, associated IP, and expiration time. It can also be used with a --domain (-d) flag that can be specified to filter the result for only a certain FQDN name.

Copy link
Contributor

@Dyanngg Dyanngg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the debug logs before pusing a PR, or mark this WIP/draft for now. Also, your change seems to break a ton of AgentQuerier interface implementations if you check the lint failures

@Dhruv-J Dhruv-J marked this pull request as draft December 19, 2024 19:43
@Dhruv-J Dhruv-J force-pushed the antctl-get-fqdncache branch 3 times, most recently from ae05e95 to 6a561ee Compare December 30, 2024 10:22
This PR adds functionality for a new antctl command: antctl get fqdncache
This command fetches the DNS mapping entries for FQDN policies by reading the
cache for DNS entries and outputting FQDN name, associated IP, and expiration
time. It can also be used with a --domain (-d) flag that can be specified to
filter the result for only a certain FQDN name.

Signed-off-by: Dhruv-J <[email protected]>
@Dhruv-J Dhruv-J force-pushed the antctl-get-fqdncache branch 2 times, most recently from 0f68993 to 17487a4 Compare January 2, 2025 20:50
@Dhruv-J Dhruv-J marked this pull request as ready for review January 6, 2025 21:32
@Dhruv-J
Copy link
Contributor Author

Dhruv-J commented Jan 6, 2025

/test-all

@Dhruv-J
Copy link
Contributor Author

Dhruv-J commented Jan 7, 2025

/test-conformance
/test-networkpolicy
/test-e2e

"time"

"github.com/stretchr/testify/require"
"gotest.tools/assert"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use github.com/stretchr/testify/assert to avoid extra dependency added in go.mod

@@ -34,6 +34,7 @@ IMAGE_NAME="antrea/codegen:kubernetes-1.31.1-build.1"
# to the "safe" list in the Git config when starting the container (required
# because of user mismatch).
if git_status=$(git status --porcelain --untracked=no 2>/dev/null) && [[ -n "${git_status}" ]]; then
echo ${git_status}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated change

@@ -34,6 +34,7 @@ IMAGE_NAME="antrea/codegen:kubernetes-1.31.1-build.1"
# to the "safe" list in the Git config when starting the container (required
# because of user mismatch).
if git_status=$(git status --porcelain --untracked=no 2>/dev/null) && [[ -n "${git_status}" ]]; then
echo ${git_status}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

return false
}

// maybe some helper funcs needed here to help parse through ^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this means

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was a dev note, removing now, as functionality has been added

for _, dnsCacheEntryExp := range expectedEntryList {
found := false
for j, dnsCacheEntryRet := range returnedList {
if !found && dnsCacheEntryExp.FqdnName == dnsCacheEntryRet.FqdnName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try if assert.ElementsMatch works out of the box for comparing those lists?

}
}

func (r FqdnCacheResponse) SortRows() bool {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the default option for sorting false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rows of the table won't need to be sorted, as there's no filter or flag for it, so the default is false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember how this function is used exactly, but it would be nice if the output rows were filtered based on the FQDN.

assert.Equal(t, tt.expectedStatus, recorder.Code)
if tt.expectedStatus == http.StatusOK {
var received []map[string]interface{}
err = json.Unmarshal(recorder.Body.Bytes(), &received)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any tests to add to check for marshal and unmarshal operation correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

triggering this error manually would be difficult, as the json transformation is not something the test is directly in control of, it is also not tested on other PRs which add antctl functionality

t.Run(tt.name, func(t *testing.T) {
reqByte, _ := json.Marshal(tt.fqdnList)
reqReader := bytes.NewReader(reqByte)
result, err := Transform(reqReader, false, tt.opts)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any test to add specific transform function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the transform function is tested as part of response.go, if it fails, then the test will fail as well

expirationTime time.Time
}

func (r FqdnCacheResponse) GetTableHeader() []string {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check if we can add specific test cases to cover this in code cov

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to be left uncovered conventionally, as it is a simple method that primarily relies on the Transform() function to execute correctly, I don't think unit tests should be added for these funcs

@elton-furtado
Copy link

there's also kind e2e test noEncap mode also failing

Signed-off-by: Dhruv-J <[email protected]>
@Dhruv-J
Copy link
Contributor Author

Dhruv-J commented Jan 8, 2025

/test-conformance
/test-networkpolicy
/test-e2e

@@ -72,6 +74,28 @@ func (r AntreaAgentInfoResponse) SortRows() bool {
return true
}

type FqdnCacheResponse struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be FQDNCacheResponse per our coding conventions
you can refer to Quan's document: https://github.com/tnqn/code-review-comments

}
}

func (r FqdnCacheResponse) SortRows() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember how this function is used exactly, but it would be nice if the output rows were filtered based on the FQDN.

Comment on lines 29 to 31
if dnsEntryCache == nil {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that your GetFqdnCache function will never return nil as far as I can tell. It's fine to keep the code as is, but maybe add a comment to this effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was for the older function, when it would return nil instead of empty, but that's fixed now, so I can remove this bit of code.

Comment on lines 913 to 936
"*.example.com": {
responseIPs: map[string]ipWithExpiration{
"maps.example.com": {
ip: net.ParseIP("10.0.0.1"),
expirationTime: time.Date(2025, 12, 25, 15, 0, 0, 0, time.UTC),
},
"mail.example.com": {
ip: net.ParseIP("10.0.0.2"),
expirationTime: time.Date(2025, 12, 25, 15, 0, 0, 0, time.UTC),
},
"photos.example.com": {
ip: net.ParseIP("10.0.0.3"),
expirationTime: time.Date(2025, 12, 25, 15, 0, 0, 0, time.UTC),
},
},
},
"antrea.io": {
responseIPs: map[string]ipWithExpiration{
"antrea.io": {
ip: net.ParseIP("10.0.0.4"),
expirationTime: time.Date(2025, 12, 25, 15, 0, 0, 0, time.UTC),
},
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test input data doesn't match what we actually store in the cache
There is never a wildcard domain ("*.example.com") as the key in the dnsEntryCache map
For the nested map (map[string]ipWithExpiration) the key is actually the string representation of the IP (for historical reasons). See:

// Key for responseIPs is the string representation of the IP.
// It helps to quickly identify IP address updates when a
// new DNS response is received.
responseIPs map[string]ipWithExpiration

So it's actually like this:

		"antrea.io": {
			responseIPs: map[string]ipWithExpiration{
				""10.0.0.4"": {
					ip:             net.ParseIP("10.0.0.4"),
					expirationTime: time.Date(2025, 12, 25, 15, 0, 0, 0, time.UTC),
				},
			},
		},

recorder := httptest.NewRecorder()
handler.ServeHTTP(recorder, req)
assert.Equal(t, tt.expectedStatus, recorder.Code)
if tt.expectedStatus == http.StatusOK {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just remove the if here, as the status code is always 200

Note that this unit test actually validates very little (which is expected), so in my opinion we didn't really need multiple test cases, just the second one ("FQDN cache exists - multiple addresses multiple domains") would have been fine as a basic (not table-driven) test.

if err != nil {
return nil, err
}
domain, exists := opts["domain"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why you chose this approach, but I think we usually do server-side filtering. Did you follow a specific example that prompted you to do client-side filtering?
The advantage of server-side filtering is that the dns cache can be huge, so being able to filter on a specific domain name in the Agent directly would be more efficient.
BTW, I think we should support a wildcard domain in the query, as that would be useful when debugging a policy rule which uses a wildcard in the FQDN. cc @Dyanngg

cacheEntryList := []types.DnsCacheEntry{}
var pattern *regexp.Regexp
var err error
if fqdnFilter != (querier.FQDNCacheFilter{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite unconventional imo, and I doubt it works the way you think it would. In newFilterFromURLQuery, it would return a new FQDNCacheFilter with "DomainName" set to "" if no filter is provided in the url, thus not matching the else branch below. What can be done is making the fqdnFilter a pointer, and check if is nil to see if there are any filtering requirements at all

// FQDNCacheFilter is used to filter the result while retrieving FQDN cache
type FQDNCacheFilter struct {
// The Name or wildcard matching expression of the domain that is being filtered
DomainName string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'll just call it Domain since it can actually be a regex

}
for fqdn, dnsMeta := range c.fqdnController.dnsEntryCache {
for _, ipWithExpiration := range dnsMeta.responseIPs {
if fqdnFilter == (querier.FQDNCacheFilter{}) || pattern.MatchString(fqdn) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fqdnFilter == nil will also be a much more appropriate check here to avoid constructing an empty FQDNCacheFilter for each dns data in cache.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a key issue with the PR right now. You have several types that you use for the same purpose but they are not used correctly IMO.

  • FQDNCacheResponse in pkg/agent/apis
  • DnsCacheEntry in pkg/agent/types
  • Response in pkg/antctl/transform/fqdncache, which wraps a *DnsCacheEntry.

FQDNCacheResponse is actually not really used at all. It is definitely not used on the server side, as you are marshaling a []DnsCacheEntry. It does not have any unexported

DnsCacheEntry (and everything in pkg/agent/types) is not really meant to be used in antctl - you will notice that antctl didn't import this package prior to your change. There is also nothing with json tags in pkg/agent/types. This is what the pkg/agent/apis is meant to be for.

The data you marshal on the server side and unmarshal on the client side (antctl) should have the same type to ensure correctness.

There are 2 possible approaches:

  1. Fix FQDNCacheResponse (export fields, add json tags, maybe replace the net.IP with a string). In the handler, convert each DnsCacheEntry to a FQDNCacheResponse manually. You can refer the bgp* agent handlers for examples. json tags should be removed from DnsCacheEntry, as this type is not meant to be serialized.
  2. Unify FQDNCacheResponse and DnsCacheEntry as a single type (there again, I would use a string instead of a net.IP for the IP address field), and use it everywhere. We have one handler like this: pkg/agent/apiserver/handlers/serviceexternalip/handler.go, with the ServiceExternalIPInfo struct.

In both cases I think you can / should remove pkg/antctl/transform/fqdncache/response.go.

I don't have a strong preference between the 2 approaches. Typically I'd go with the first one and keep 2 types. The conversion logic is a good place to convert the net.IP field in DnsCacheEntry to a string field in FQDNCacheResponse.

@@ -47,6 +48,7 @@ type AgentQuerier interface {
GetMemberlistCluster() memberlist.Interface
GetNodeLister() corelisters.NodeLister
GetBGPPolicyInfoQuerier() querier.AgentBGPPolicyInfoQuerier
GetFqdnCache(querier.FQDNCacheFilter) []types.DnsCacheEntry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong: there is no need to add this function to the interface because you can already call GetNetworkPolicyInfoQuerier().GetFqdnCache(filter), which is what we do for other handlers.

func HandleFunc(aq agentquerier.AgentQuerier) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
fqdnFilter := newFilterFromURLQuery(r.URL.Query())
dnsEntryCache := aq.GetFqdnCache(fqdnFilter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be GetFQDNCache per the style guide. Please update all other occurrences accordingly.

Comment on lines 77 to 78
FqdnName string `json:"fqdnName,omitempty"`
IpAddress string `json:"ipAddress,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FqdnName string `json:"fqdnName,omitempty"`
IpAddress string `json:"ipAddress,omitempty"`
FQDNName string `json:"fqdnName,omitempty"`
IPAddress string `json:"ipAddress,omitempty"`

Please review other occurrences as well

return func(w http.ResponseWriter, r *http.Request) {
fqdnFilter := newFilterFromURLQuery(r.URL.Query())
dnsEntryCache := npq.GetFQDNCache(fqdnFilter)
resp := []agentapi.FQDNCacheResponse{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resp := []agentapi.FQDNCacheResponse{}
resp := make([]agentapi.FQDNCacheResponse, 0, len(dnsEntryCache))

to pre-allocate the memory

{
name: "FQDN cache exists - multiple addresses multiple domains",
expectedStatus: http.StatusOK,
expectedResponse: []types.DnsCacheEntry{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this field to filteredCacheEntries and make expectedResponse a field of type []FQDNCacheResponse instead. It will feel like you are duplicating data, but really it will lead to a cleaner test.

Comment on lines 551 to 554
if err != nil {
// this pattern will match no strings if there is an error with the regex formatting or usage with the user specified --domain flag
pattern = regexp.MustCompile(`a\A`)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should handle such errors better, i.e., by returning a BadRequest HTTP code if the domain arg is not valid.

so maybe the regex conversion should happen in the handler and FQDNCacheResponse should include a regex instead of a string? cc @Dyanngg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about a warning displayed to the user but still an empty output?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's an invalid pattern, it should be an error. A warning in the Antrea Agent logs is much harder to find. We should not treat 1) an unsupported pattern, and 2) a valid pattern that doesn't match anything, in the same way IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops I thought you were done addressing comments, but looks like it is not the case
I will review again later

Comment on lines 555 to 558
} else {
// this pattern will match all strings if the filter is unset
pattern = regexp.MustCompile(`.*`)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, you should just skip regex evaluation in this case

Signed-off-by: Dhruv-J <[email protected]>
Signed-off-by: Dhruv-J <[email protected]>
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more comments

FqdnName: "example.com",
IpAddress: net.ParseIP("10.0.0.1"),
FQDNName: "example.com",
IPAddress: net.ParseIP("10.0.0.1"),
ExpirationTime: time.Date(2025, 12, 25, 15, 0, 0, 0, time.UTC),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the same time is used everywhere, I suggest you declare one variable of type time.Time at the beginning of the test function, which you can then use everywhere. Actually, we usually do something like this:

expirationTime := time.Now().Add(1*time.Hour)

then use ExpirationTime: expirationTime everywhere

expectedResponse: []apis.FQDNCacheResponse{
{
FQDNName: "example.com",
IPAddress: net.ParseIP("10.0.0.1").String(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IPAddress: net.ParseIP("10.0.0.1").String(),
IPAddress: "10.0.0.1",

same in other places below

handler.ServeHTTP(recorder, req)
assert.Equal(t, tt.expectedStatus, recorder.Code)
var receivedResponse []map[string]interface{}
err = json.Unmarshal(recorder.Body.Bytes(), &receivedResponse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unmarshal the response as a []apis.FQDNCacheResponse, not as a []map[string]interface{}

and then just use assert.Equal(t, tt.expectedResponse, receivedResponse)

responseIPs: map[string]ipWithExpiration{
"10.0.0.1": {
ip: net.ParseIP("10.0.0.1"),
expirationTime: time.Date(2025, 12, 25, 15, 0, 0, 0, time.UTC),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as for the handler unit test

Comment on lines 551 to 554
if err != nil {
// this pattern will match no strings if there is an error with the regex formatting or usage with the user specified --domain flag
pattern = regexp.MustCompile(`a\A`)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops I thought you were done addressing comments, but looks like it is not the case
I will review again later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants