Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IX won't TX #31

Open
ssps opened this issue Jan 29, 2018 · 26 comments
Open

IX won't TX #31

ssps opened this issue Jan 29, 2018 · 26 comments

Comments

@ssps
Copy link

ssps commented Jan 29, 2018

Dear IX developers,

I'd like to reproduce experiments from IX and Zygos paper in my own environment, but I've encountered following issues.
Here are some details about the environment and problems.

  1. General environment
    CPU: Intel Xeon E5-2690
    RAM: 120GB
    Kernel: 4.4.0-104-generic
    DPDK version: 16.04 (installed following the guides)
    10GbE NIC: Intel 82599ES
    40GbE NIC: Intel XL710

  2. 10GbE Multi-port TX

  • Only the first NIC listed in the configuration can RX / TX packets.
  • Rest of the NICs can RX packets, but it cannot TX packets out to wire.
  1. 40GbE TX
  • 40GbE NIC can RX packets, but it won't TX packets out to wire

Regards,
Ilwoo

@prekageo
Copy link
Contributor

Regarding DPDK: You say that you installed it following the guides. What exactly do you mean? IX and Zygos automatically compile and use a compatible version of DPDK as part of their Makefile. I suggest that you use the following repository to build and use IX and Zygos: https://github.com/ix-project/zygos-bench

Keep in mind that Zygos doesn't support multiple NICs unlike IX.

Regarding the 40GbE, is it IX or Zygos that doesn't work? Or both? Can you give the ix.conf and the command line used?

@ssps
Copy link
Author

ssps commented Jan 30, 2018

Dear @prekageo ,

Thanks for your response.

  1. DPDK
    Looks like I've chosen wrong word "install".
    I've used the commands provided in README.md to build and run IX.

  2. 40GbE and IX, Zygos
    I've tested both system under 10GbE link.
    For 40GbE NIC, I've tested only with IX and couldn't detect any outgoing packets in the wire.

  3. IX configuration file
    Uploaded as .txt as Github won't support file attachment with unknown file type.

ix.conf.txt

@prekageo
Copy link
Contributor

I guess you have 2 different problems:

  1. IX with multiple NICs uses only the first NIC for tx. Is that correct? In that case, what kind of packets do you send and receive? ARP, UDP, TCP? How do you observe rx and tx?

  2. IX with 40GbE doesn't TX. What kind of packets do you try to transmit? Have you tried doing an arping against IX from another host of your subnet?

@ssps
Copy link
Author

ssps commented Jan 30, 2018

Dear @prekageo ,

I've wired two machines directly with cable during the test.
Server (echoserver) runs on top of IX stack and Client runs on top of kernel stack.

Correct.
Stack has been tested with ICMP ping (linux ping command) and TCP client (provided in ix-bench).
Neither of clients received return packets from IX server.
I've setup some hooks in ix/dp/net/ip.c and ix/inc/ix/ethqueue.h to verify whether stack received correct packets and enqueues outgoing packets.
IX successfully receives SYN packet and enqueues SYN/ACK to the target device, but the SYN/ACK packet cannot be observed in the other end.

As in 1., ICMP ping and TCP connection.
I've tested the system with arping as you've recommended and IX fails to reply to ARP request.
mTCP applications replies fine without any problem.

@prekageo
Copy link
Contributor

Regarding point 1. ICMP always uses the first NIC. TCP should use the same NIC for Tx as Rx. How do connect 2 servers together via cable if one of them has multiple NICs? I don't understand your network topology.

@ssps
Copy link
Author

ssps commented Jan 30, 2018

Dear @prekageo

I'll explain how I tested TCP connection with multiple NIC.

  1. Hardware

Both machines have two Intel 10G NICs. (for 40GbE I've replaced one of NICs to 40GbE).
Two cables are used to connect first port of each NIC to the other machine's NICs.
If I denote cable connection as "<->", it can be drawn as follows.

[machine0] [machine1]
05:00.0 <-> 03:00.0
07:00.0 <-> 05:00.0

  1. Configuration

1.1 Server (IX)
[ix.conf]
host_addr="10.0.0.2/24"
port=8000
...
devices=["05:00.0", "07:00.0"]
...

1.2 Client (Linux)
Two netdev entries are available.

p3p1 (directly connected with "05:00.0")
p5p1 (directly connected with "07:00.0")

  1. Shell commands

[Server]
Run server with following command.
# ./dp/ix -- apps/echoserver 64

[Client]
Run client with two set of commands.

# ifconfig p5p1 down
# ifconfig p3p1 10.0.0.6/24
$ ./client 10.0.0.2 8000 100 64 100 (works fine)

and

# ifconfig p3p1 down
# ifconfig p5p1 10.0.0.6/24
$ ./client 10.0.0.2 8000 100 64 100 (does not work)

  1. Side note

If I modify "devices" line in [ix.conf] to ["07:00.0", "05:00.0"], now p5p1 (which is connected to 07:00.0) can receive packets from IX while p3p1 (which is connected to 05:00.0) won't.

@prekageo
Copy link
Contributor

Unfortunately, this is an invalid network configuration. Unless you modify the MAC addresses of your devices, this setup will not work. Moreover, I am not really sure what you are trying to achieve with this setup. If you want to understand how to properly setup a bond read the following: https://www.kernel.org/doc/Documentation/networking/bonding.txt

@ssps
Copy link
Author

ssps commented Jan 30, 2018

Dear @prekageo ,

  1. Purpose
    I cannot run 40Gbps experiment (1 server, 4 clients) in different setup due to this problem.
    I'm currently trying to identify the cause and the above is my debug setup.

  2. Bonding
    You're suggesting me to use bonding driver setup in client, right?
    I'm using static ARP entry on both sides and re-launching IX server whenever I'm switching client configuration.
    MAC address in client side and server side won't interfere here.
    Bonding driver could reduce the time setting up the client, I'll use that in the future.

P.S. Does IX tries to send packets with the MAC address of first NIC port?

@prekageo
Copy link
Contributor

Use tcpdump on Linux side to verify. Your setup is not valid. Even Linux will not work with this setup. You have to setup a proper bond interface and first make sure that Linux works with it.

@ssps
Copy link
Author

ssps commented Jan 30, 2018

Dear @prekageo ,

Connectivity on both NIC is verified with Linux stacks on both sides.
Obviously I've used different subnet to overcome routing issue in kernel.
However this is not an issue in my debug setup as I'll turn off the other NIC when the other one is active.

Could you clarify why bonding NIC is so important?

@prekageo
Copy link
Contributor

Are you using bonding or not with Linux? There is a single way to setup bonding under Linux. You have to follow the document I pointed out earlier. If you don't using bonding under Linux, how do you expect IX to work under your setup when you specifically ask from IX to operate in bond mode? You have to read and understand better what is a bond and how to set it up. Hint: There is a single MAC for all the adapters in a bond.

@ssps
Copy link
Author

ssps commented Jan 30, 2018

Dear @prekageo

Okay now I understand.

I was thinking that you asked me to setup bonding in client-side.
I'll try to bond the NICs in IX server-side and see how it works.

BTW, shouldn't the IX use DPDK bonding driver internally, if it's designed around it?

@prekageo
Copy link
Contributor

I assume that you are no longer interested in this issue. If that's not the case, please re-open it.

@kkaffes
Copy link

kkaffes commented May 9, 2019

I have the same problem as @ssps. IX using a 40Gb Intel XL710 NIC is able to RX packets but cannot transmit anything on the wire.

Setup:
CPU: Intel Xeon E5-2620
Kernel: 4.4.0-145-generic
DPDK version: 16.04
40GbE NIC: Intel XL710
Manually set ARP entries

@prekageo
Copy link
Contributor

Hi Kostis,

Please describe your networking setup and what you have tried so far.

@prekageo prekageo reopened this May 10, 2019
@kkaffes
Copy link

kkaffes commented May 10, 2019

Thanks George!

We have two Intel Xeon E5-2620 servers with Intel XL710 40Gb NICs connected through a Barefoot switch.

Initially, I successfully pinged between the two machines:
Server 1: 10.1.0.11/24
Server 2: 10.1.0.12/24

Then, I ran the IX echoserver on Server 2 using the following config:

host_addr="10.1.0.12/24"
gateway_addr="10.1.0.1"
port=1234
devices="0:05:00.0"
cpu=0
batch=64
loader_path="/lib64/ld-linux-x86-64.so.2"
arp=(
  {
     ip : "10.1.0.11" 
     mac : "3c:fd:fe:c3:e0:60"
  }
)

and the client on server 1:

echo 123 | nc -vv 10.1.0.12 1234

which never returns. I also tried pinging without any success.

After I added print messages, I found out that packets were received and processed at server2. Replies were generated, added to the TX queue, I40E_PCI_REG_WRITE was called but nothing was sent out to the wire.

@prekageo
Copy link
Contributor

  1. Please try to arping 10.1.0.12 from the client.
  2. Try to use ZygOS instead of IX.

@kkaffes
Copy link

kkaffes commented May 10, 2019

  1. It turns out that the switch (of which I am not in control) drops all ARP packets and therefore any arping fails. When I manually add the arp entries using arp -s, the non-IX ping between the two servers works. However, even though I have the same entry in the ix.conf on the server side, the IX server still does not send anything out to the wire. That's why I think it is not an arp issue.
  2. TX does not work for ZygOS either.

@prekageo
Copy link
Contributor

  1. Can you connect a cable directly between the 2 servers?
  2. Can you try with another switch?
  3. How many network cables are attached to each server specifically? 100M, 1G, 10G, 40G? How many NICs exist and which of them are operational?
  4. Have you tried another DPDK application, e.g. pktgen?

@kkaffes
Copy link

kkaffes commented May 10, 2019

  1. I had tried that some time ago with a different set of servers and TX did not work there either.
  2. Unfortunately, no.
  3. On each server there are 3 dual port NICs, 2 of them are Gigabit BroadCom NetXtreme BCM5720 and one is Intel XL710 QSFP+. One port of one of the BroadCom NICs is connected to a Gigabit switch via 1G cable and one port of the XL710 is connected to the Barefoot switch using 40G cable.
  4. I tried a simple custom DPDK application where I hard-code the receiver's IP and MAC addresses and it works, i.e., packets are sent from server 2, delivered to server 1, and captured by tcpdump.

Since the DPDK app is working, the problem must be in the i40e IX/ZygOS driver. Have you tested it recently in your setup?

Thanks again for your help!

@prekageo
Copy link
Contributor

I don't have access anymore to hardware to test it. I would suggest that you dump all the bytes transmitted by IX/ZygOS and verify that they are the same with the bytes transmitted by your custom DPDK application. For example, is the source and destination MAC correct?

@kkaffes
Copy link

kkaffes commented May 13, 2019

Thank you George. I will continue to work on this and submit a pull request once I find and fix the issue.

@kkaffes
Copy link

kkaffes commented May 17, 2019

After a lot of digging and basically reimplementing the i40e driver from scratch I found out that the TX problem is fixed by changing this line:

tx_ctx.rdylist = 1;

to

tx_ctx.rdylist = 0;

Since I do not know what side-effects this might cause to other systems, let me know if it is OK to submit a PR.

@prekageo
Copy link
Contributor

First of all, thanks for finding this! Actually, 0 or 1 just happen to work depending on how the device is configured. The appropriate way to fix this issue would be to replace this:

ix/dp/drivers/i40e.c

Lines 280 to 287 in b4825cc

/* clear the context structure first */
memset(&tx_ctx, 0, sizeof(tx_ctx));
tx_ctx.new_context = 1;
#ifdef RTE_LIBRTE_IEEE1588
tx_ctx.timesync_ena = 1;
#endif
tx_ctx.rdylist = 1;
tx_ctx.fd_ena = TRUE;

with a code sequence that does the following:

  1. i40e_get_lan_tx_queue_context()
  2. check return for errors

Then, you let the existing code modify base and qlen and call i40e_set_lan_tx_queue_context.

I will be more than happy to merge such a pull request. Please, test it before submitting it, as I cannot test anymore this code.

@kkaffes
Copy link

kkaffes commented May 20, 2019

No problem! I tried what you suggested but it seems like i40e_get_lan_tx_queue_context() always zeroes tx_ctx even though the return value is I40E_SUCCESS. For example:

log_info("txq->len = %u\n", txq->len);
ret = i40e_get_lan_tx_queue_context(hw, pf_q, &tx_ctx);
if (ret != I40E_SUCCESS) {
  log_err("Failed to get LAN TX queue context.\n");
  return ret;
}

tx_ctx.base = txq->ring_physaddr / I40E_QUEUE_BASE_ADDR_UNIT;
tx_ctx.qlen = txq->len;

ret = i40e_set_lan_tx_queue_context(hw, pf_q, &tx_ctx);
if (ret != I40E_SUCCESS) {
  log_err("Failed to get LAN TX queue context.\n");
  return ret;
}

ret = i40e_get_lan_tx_queue_context(hw, pf_q, &tx_ctx);
if (ret != I40E_SUCCESS) {
  log_err("Failed to get LAN TX queue context.\n");
  return ret;
}
log_info("tx_ctx.qlen = %u\n", tx_ctx.qlen);

prints

txq->len = 4096
tx_ctx.qlen = 0

The same also happens for the other fields, e.g., tx_ctx.base and tx_ctx.fd_ena.

@prekageo
Copy link
Contributor

Given my lack of access to the hardware, I can only provide you with high level instructions on how to figure out what's going on:

  1. Don't go into Dune mode so that you can use gdb. Better yet, just copy this code into a separate C file to compile and debug. Or just use the DPDK application that you mentioned that you have already written.
  2. Compile DPDK with export EXTRA_CFLAGS='-O0 -g' to have debug symbols.
  3. Single step through the i40e_get_lan_tx_queue_context function and figure out why it reads zeros.
  4. If it does indeed read zeros, then single step through the i40e_set_lan_tx_queue_context and figure if and why it writes zeros.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants