Skip to content
This repository has been archived by the owner on Sep 6, 2023. It is now read-only.

metal_port resource #116

Merged
merged 17 commits into from
Aug 18, 2021
Merged

metal_port resource #116

merged 17 commits into from
Aug 18, 2021

Conversation

t0mk
Copy link
Contributor

@t0mk t0mk commented May 26, 2021

This is WIP for resource metal_port as suggested in #12
related are equinix/terraform-provider-equinix#193 and #95

Signed-off-by: Tomas Karasek [email protected]

// convert to layer3
if l2Ok && !l2.(bool) && isLayer2 {
ips := []packngo.AddressRequest{
{AddressFamily: 4, Public: true},
Copy link
Member

@displague displague May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future it would be nice to provide the AddressRequest arguments for selecting existing addresses and creating EIPs through this metal_port resource.

This potentially avoids the need for metal_ip_address_attachment resources. That can be figured out in the future. We would have to consider what to do when the two resources are in conflict (metal_ip_address_attachment creates attachments not found in the metal_port list. Does the metal_port resource become the defacto way to specify all of these features?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the L3 conversion has the possibility to specify address requests, but I'd leave it to a PR in the future, separately form the initial introduction of metal_port. I feel there will be other issues to focus on.

Also, it's not possible to specify IPs in the console when converting to L3.

return err
}

// Constraint: Only bond ports have layer2 mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these commented sections could be its own function to keep the cyclomatic complexity down. Your comment block at https://github.com/equinix/terraform-provider-metal/pull/116/files#diff-266370b6fe0d745e5c92c5193546cb0a9f8d0c56ef0b65355e63759fc44b2b92R188-R195 would become a function calling each of these functions.


func resourceMetalPortRead(d *schema.ResourceData, meta interface{}) error {
client := meta.(*packngo.Client)
port, err := getPortByResourceData(d, client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it seem unusual for a resource to load from either id or a composite key (name,device_id)? It is usual for data sources to work this way, but I wonder if this is safe on a resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not safe and I removed it. metal_port now needs only port id

}

d.SetId(port.ID)
return setMap(d, m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setMap helper takes functions where the if blocks could be resolved. It's the pattern that I've been using.

I don't see anything wrong with the approach you've taken. It's flatter.

@t0mk
Copy link
Contributor Author

t0mk commented Jun 3, 2021

I pushed more work on the metal_port.

Each of these commented sections could be its own function to keep the cyclomatic complexity down. Your comment block

I refactored the metal_port create/update in order to tidy it up. I had to make some design decisions, like the ClientResourcePort struct.

Does it seem unusual for a resource to load from either id or a composite key ...

Yes, it is, and it's causing some trouble, e.g. we can't have id as resource attribute (id is only allowed in datasources). I think I will have to keep the read functions for the port datasource and resource separate.

This is still work in progress.

@displague Maybe you can think about what the Delete action of metal_port resource should do. In terms of port bond state, attached VLANs, Layer2/3. I originally wanted to keep the Delete method a nil-returning dummy, but I'm not sure how we'd then unattach VLANs from ports.

@displague
Copy link
Member

displague commented Jun 7, 2021

@t0mk My first hunch is to restore the port defaults. The defaults are the port configuration that comes when you create a device and do not specify anything about IP addresses.

I'm not 100% sure about this, but I think the defaults are:

  • bond0: bonded, no vlans, ipv4-public, ipv6-public, ipv4-managed
    • eth0: layer3
    • eth1: layer3
  • bond1: bonded, no-vlans
    • eth2: layer2
    • eth3: layer2

I do worry about side effects, like potentially losing the active IP address on the node.
If we let users toggle addresses on ports in the future, would we want to expose those dynamically resolved addresses as properties of the port or should users look up the device to get the revised addresses?

Port reconfiguration is going to be tricky from a usability perspective whenever the addresses are changed as a side-effect. We'll have to provide guides that offer strategies to avoid losing networking capabilities in the OS.

Because any metal_port change could be destructive, I think it is ok for the delete to also potentially be destructive, while also being restorative (back to the original state).

Perhaps we could start with this and introduce a toggle property reset_on_delete = false (better name?) later if users would prefer the no-op delete.

@t0mk
Copy link
Contributor Author

t0mk commented Jun 10, 2021

@displague sorry for radio silence, I've been testing metal_port quite a bit. I pushed another round of fixes, but it's not very nteresting this time.

I agree with the optional reset to defaults on Delete. I'll do it next.

There are some race conditions when handling VLANs with metal_port, for example when assigning and unassigning from different ports in the same tf run. Or when disbonding bond0 and unassigning vlans from eth1. I will need to look on how to sync those, and whether it's even possible.

Next, I will also show templates with configurations for particular network types.

@t0mk t0mk force-pushed the port_resource branch 2 times, most recently from 84587af to 3f5a3fc Compare June 13, 2021 10:21
@t0mk
Copy link
Contributor Author

t0mk commented Jun 14, 2021

@displague I added

  • reset_on_delete for restoration to defaults when tf resource metal_port is deleted
  • I added draft of doc, please take a look.
  • There are 2 race conditions that I didn't resolve:
    • assigning and removing the same VLAN in the same terraform run, e.g. a VLAN is assigned to eth1 via metal_port.vlan_ids and user moves it to bond0.vlan_id. There's no guarantee that the update on eth1 will be run before the update of bond0
    • Bonding a bond port where underlying eth port has vlans assigned, and those vlans are being removed in the same terraform run, e.g. eth1 has VLAN assigned and user changes bonded to true in bond0. There's no way to check the constraint that eth1 should not have VLANs assigned when bonding bond0. There's no link from bond port to underlying eth ports, only the other way around, from eth ports to their bond port.

The penultimate item is nicely solved by the port_vlan_attachment resource.

Not sure how to resolve this. Do you know of any tf resource that does the attachments without a special resource, so I could take a look on how they did it?

@displague displague self-requested a review June 14, 2021 13:41
@displague
Copy link
Member

@t0mk and I have discussed how equinixmetal-archive/packngo#291 may resolve some of the racing issues described above.

@displague
Copy link
Member

The PR that offers bulk VLAN attachments is available at equinixmetal-archive/packngo#291. Before adopting it for this PR, we should make sure the feature works outside of IBX facilities. (@jordan0day)

t0mk and others added 8 commits August 6, 2021 01:09
Signed-off-by: Tomas Karasek <[email protected]>
Signed-off-by: Tomas Karasek <[email protected]>
Signed-off-by: Tomas Karasek <[email protected]>
Signed-off-by: Marques Johansson <[email protected]>
}
return nil

// all device ports must be unassigned before delete
Copy link
Member

@displague displague Aug 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t0mk we can wrap this behavior behind an attribute (force_delete? unassign_on_delete?), but if we do I think we may want it to be the default behavior.

To some extent, I see this as simply working around an API limitation. When I ask for this type of resource to be deleted, I expect that the associations would be deleted automatically.

As we've seen in the failed tests, it seems this kind of behavior will be required to get around the dependency ordering (vlan and device are independent, port is dependent). If you try to delete a metal_vlan, it depends_on the device ports being updated first (which may have some affect on the metal_port and metal_port_vlan_attachment resources). I think we've seen this problem before in tests before metal_port.

We might have similar problems if a metal_vlan must be recreated while a metal_port is consuming it (or would Terraform handle that successfully?) I'll need to add a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@displague I see what you are trying to do. I was thinking about this and can't come up with a scenario where user would remove a metal_vlan and wanted to keep the associations. So I think that unassign_on_delete is not necessary, let's keep it as you've just done it. I also don't think that the disassociation on removal will harm existing setups.

It's also OK with the vlan_attachment resources, because those will be always removed before metal_vlan (unless imported in some crazy way).

IMO, re-creation when changing will not be a problem with vlan_attachment, but will be a problem with metal_port. With metal_port, a re-created (and dissociated) VLAN will not be re-attached to the device port, because the vlan attachment happens on creation (or update) of metal_port.

metal/resource_metal_port.go Outdated Show resolved Hide resolved
}

for _, f := range [](func(*ClientPortResource) error){
removeNativeVlan,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can't use removeNativeVlan, makeBond and convertToL3 here. Those functions will make sure that the different aspects of the port configuration (native VLAN, bond, L3) are set according to the resource conf. That's why the schema.ResourceData is part of the CPR struct. If you call e.g. convertToL3 to a resource

resource "metal_port" "bond0" {
  port_id = local.bond0_id
  layer2 = true
  bonded = true
  reset_on_delete = true

.. it will not convert the port to L3 because in the resource conf, layer2 is true.

What you want to achieve here is to put the port to somehow default state: no vlans, no native vlan, L3, bonded. You don't need the Resource struct for that, you shouldn't look at the current configuration.

The function should go like

func portToDefault(c *packngo.Client, portID string) error {
	getOpts := &packngo.GetOptions{Includes: []string{
		"native_virtual_network",
		"virtual_networks",
	}}
	port, _, err := c.Ports.Get(portID, getOpts)
	if err != nil {
		return err
	}

	if port.NativeVirtualNetwork != nil {
		// remove native VLAN with the new bulk API
	}

	if len(port.AttachedVirtualNetworks) > 0 {
		// remove VLANs with the new bulk API
	}

	// create bond
	if !port.Data.Bonded {
		port, _, err = c.DevicePorts.Bond(port, false)
		if err != nil {
			return err
		}
	}
	if contains(l2Types, port.NetworkType) {
		ips := []packngo.AddressRequest{
			{AddressFamily: 4, Public: true},
			{AddressFamily: 4, Public: false},
			{AddressFamily: 6, Public: true},
		}
		_, _, err := c.Ports.ConvertToLayerThree(port.ID, ips)
		if err != nil {
			return err
		}
	}
	return nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can then have the delete function for metal_port as:

func resourceMetalPortDelete(d *schema.ResourceData, meta interface{}) error {
        client := meta.(*packngo.Client)
	resetRaw, resetOk := d.GetOk("reset_on_delete")
	if resetOk && resetRaw.(bool) {
		return portToDefault(client, d.Id())
        }
        return nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the concerns I have is in how reset_on_delete should behave on eth0 vs eth1 vs bond0 vs bond1. We are taking on some authority about how the Equinix Metal API is expected to behave without the Equinix Metal API expressly stating that behavior, or offering an API endpoint to exact the defaults.

I know of, for example, an operating system that is deployed with Layer2-Individual as the default port configuration.

Perhaps this is just a documentation concern.

reset_on_delete will reset the port to the most common default values when this resource is deleted. This setting may impact other ports on the same bond. For eth0, eth1, and bond0 ports, the port will be bonded (with both eth0 and eth1) in Layer3 mode, potentially issuing new IP addresses, with no VLANs attached. For eth2, eth3, and bond1 ports, the port will be converted to Layer2-Bonded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@displague hmm, I see, the defaults might be different. Well we could document that reset_on_default will mean that the bond will go to bonded L3, and users shouldn't be using reset_on_delete with the device type which defaults to L2-bonded.

Copy link
Member

@displague displague Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you want to achieve here is to put the port to somehow default state: no vlans, no native vlan, L3, bonded. You don't need the Resource struct for that, you shouldn't look at the current configuration.

In 4e4264f I'm taking advantage of our helper functions (that rely ResourceData) by first calling d.Set within Delete. I think this will let us avoid duplicating the reset code in another function.

@tombuildsstuff raised a concern about reading properties from the resource because the resource may have been removed from the config. I think I've avoided that concern by setting the attributes that our helper functions require.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All but the L2Bonded tests passed. I'll pick that up next:

    resource_metal_port_test.go:128: Step 4/5 error: After applying this test step, the plan was not empty.
        stdout:
        
        
        An execution plan has been generated and is shown below.
        Resource actions are indicated with the following symbols:
          ~ update in-place
        
        Terraform will perform the following actions:
        
          # metal_port.bond0 will be updated in-place
          ~ resource "metal_port" "bond0" {
                bonded            = true
                disbond_supported = true
                id                = "c20dbb55-5d59-4de8-9571-be11c7968ac9"
              - layer2            = true -> null
                name              = "bond0"
                network_type      = "layer2-bonded"
                port_id           = "c20dbb55-5d59-4de8-9571-be11c7968ac9"
                type              = "NetworkBondPort"
            }
        
        Plan: 0 to add, 1 to change, 0 to destroy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@displague I think it's because the reset on delete doesn't really happen. The port is not put back to L3 (You can check the POST calls in the standard output when run with TF_LOG=DEBUG and TF_LOG_PATH=logfile.txt). It's because of the reasons I wrote here:
https://github.com/equinix/terraform-provider-metal/pull/116/files#r686799824

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are passing. I'm wondering if bonded and layer2 should be optional+computed. The error we were getting before about layer3 -> null is because layer3 is only set to true or false depending on if l2 or l3` is detected, but it is possible that neither is detected.

@displague displague marked this pull request as ready for review August 16, 2021 17:19
@displague
Copy link
Member

I'm going to merge this so that we may continue testing it within an alpha release.

@displague displague merged commit 6d57288 into main Aug 18, 2021
@displague displague deleted the port_resource branch August 18, 2021 21:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants