15 June 2022

Engineering data consolidation efforts

Since I built my Ryzen 5950X system, and the 12900K system, and then had to completely disassemble the 12900K system, and then built another Ryzen 5950X system whilst arguing with Asus, I was in the middle of a data consolidation effort for all of my engineering data from the various projects that I've worked on over the years.

Today marks the day where the first pass of this data consolidation effort has completed and I ended up saving almost 14 TB of storage space.

It feels nice, and I get a sense of accomplishment as the data is being written to tape right now.

I can't believe that it's taken me like about 6 months to finish this data consolidation effort.

At some points during the process of unpacking, packing, and then re-packing the data, both Ryzen 5950Xs and also the Intel Core i7-4930K that's in the headnode was oversubscribed 3:1 when it was processing the data. That just seems pretty crazy to me because that's also a little bit of an indication as to how much work the CPUs had to do to process and re-process the data.

Not to mention, my poor, poor hard drives, that have been working so hard throughout all of this.

13 June 2022

Welp....this is a problem.

Let me begin with the problem statement:

---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF        Device         : mlx5_0
 Number of qps   : 1        Transport type : IB
 Connection type : RC        Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x0c QPN 0x008d PSN 0x277b7c
 remote address: LID 0x05 QPN 0x010f PSN 0xda4554
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2          100000           0.000000            0.064007            4.000452
 4          100000           0.00               0.11              3.516592
 8          100000           0.00               0.26              4.078050
 16         100000           0.00               0.52              4.069701
 32         100000           0.00               1.05              4.086223
 64         100000           0.00               2.09              4.074705
 128        100000           0.00               4.27              4.167070
 256        100000           0.00               9.31              4.547246
 512        100000           0.00               12.20             2.978638
 1024       100000           0.00               13.17             1.607263
 2048       100000           0.00               13.64             0.832231
 4096       100000           0.00               13.82             0.421746
 8192       100000           0.00               13.96             0.212971
 16384      100000           0.00               14.08             0.107404
 32768      100000           0.00               14.12             0.053869
 65536      100000           0.00               14.17             0.027029
 131072     100000           0.00               14.19             0.013528
 262144     100000           0.00               14.17             0.006759
 524288     100000           0.00               14.15             0.003375
 1048576    100000           0.00               14.16             0.001688
 2097152    100000           0.00               14.14             0.000843
 4194304    100000           0.00               14.13             0.000421
 8388608    100000           0.00               14.12             0.000210
---------------------------------------------------------------------------------------

What you see above is the results from the 100 Gbps Infiniband network bandwidth test that are between my two AMD Ryzen 5950X systems. Both of them has a discrete GPU in the primary PCIe slot, and then the Mellanox ConnectX-4 dual port, 100 Gbps Infiniband NIC is in the next available PCIe slot.

I can't really tell from the motherboard manual for the Asus ROG Strix X570-E Gaming WiFi II motherboard what speed the second PCIe slot is supposed to be when there is a discrete GPU plugged into the primary PCIe slot.

The Mellanox ConnectX-4 card is a PCIe 3.0 x16 card, which means that the slot itself is supposed to support upto 128 Gbps (and the ports themselves is supposed to go up to a maximum of 100 Gbps out of the 128 Gbps that's theorectically available). If the slots were running as PCIe 3.0 x4, it should be capable of 32 Gbps.

As the results show, clearly, that is not the case.

I'll have to see if I can run both of those systems without the discrete GPU, so that I can plug the Mellanox cards into the primary PCIe slot.


*Update 2022-06-14*:

So I took out the discrete GPUs from both systems and put the Mellanox card into the primary PCIe slot and this is what I get from the bandwidth test results:

---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF        Device         : mlx5_0
 Number of qps   : 1        Transport type : IB
 Connection type : RC        Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x0c QPN 0x008c PSN 0x5ccdd5
 remote address: LID 0x05 QPN 0x010a PSN 0x178491
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2          100000           0.000000            0.066552            4.159479
 4          100000           0.00               0.11              3.529205
 8          100000           0.00               0.27              4.225857
 16         100000           0.00               0.54              4.254547
 32         100000           0.00               1.09              4.254549
 64         100000           0.00               2.19              4.276291
 128        100000           0.00               4.51              4.408332
 256        100000           0.00               9.21              4.498839
 512        100000           0.00               18.60             4.540925
 1024       100000           0.00               36.74             4.485289
 2048       100000           0.00               75.76             4.623960
 4096       100000           0.00               96.55             2.946372
 8192       100000           0.00               96.57             1.473530
 16384      100000           0.00               96.58             0.736823
 32768      100000           0.00               96.58             0.368421
 65536      100000           0.00               96.58             0.184218
 131072     100000           0.00               96.58             0.092109
 262144     100000           0.00               96.58             0.046055
 524288     100000           0.00               96.58             0.023027
 1048576    100000           0.00               96.58             0.011514
 2097152    100000           0.00               96.58             0.005757
 4194304    100000           0.00               96.58             0.002878
 8388608    100000           0.00               96.58             0.001439
---------------------------------------------------------------------------------------



Ahhhh.....much better. That's more like it.