Since I built my Ryzen 5950X system, and the 12900K system, and then had to completely disassemble the 12900K system, and then built another Ryzen 5950X system whilst arguing with Asus, I was in the middle of a data consolidation effort for all of my engineering data from the various projects that I've worked on over the years.
Today marks the day where the first pass of this data consolidation effort has completed and I ended up saving almost 14 TB of storage space.
It feels nice, and I get a sense of accomplishment as the data is being written to tape right now.
I can't believe that it's taken me like about 6 months to finish this data consolidation effort.
At some points during the process of unpacking, packing, and then re-packing the data, both Ryzen 5950Xs and also the Intel Core i7-4930K that's in the headnode was oversubscribed 3:1 when it was processing the data. That just seems pretty crazy to me because that's also a little bit of an indication as to how much work the CPUs had to do to process and re-process the data.
Not to mention, my poor, poor hard drives, that have been working so hard throughout all of this.
15 June 2022
Engineering data consolidation efforts
13 June 2022
Welp....this is a problem.
Let me begin with the problem statement:
What you see above is the results from the 100 Gbps Infiniband network bandwidth test that are between my two AMD Ryzen 5950X systems. Both of them has a discrete GPU in the primary PCIe slot, and then the Mellanox ConnectX-4 dual port, 100 Gbps Infiniband NIC is in the next available PCIe slot.
I can't really tell from the motherboard manual for the Asus ROG Strix X570-E Gaming WiFi II motherboard what speed the second PCIe slot is supposed to be when there is a discrete GPU plugged into the primary PCIe slot.
The Mellanox ConnectX-4 card is a PCIe 3.0 x16 card, which means that the slot itself is supposed to support upto 128 Gbps (and the ports themselves is supposed to go up to a maximum of 100 Gbps out of the 128 Gbps that's theorectically available). If the slots were running as PCIe 3.0 x4, it should be capable of 32 Gbps.
As the results show, clearly, that is not the case.
I'll have to see if I can run both of those systems without the discrete GPU, so that I can plug the Mellanox cards into the primary PCIe slot.
*Update 2022-06-14*:
So I took out the discrete GPUs from both systems and put the Mellanox card into the primary PCIe slot and this is what I get from the bandwidth test results:
---------------------------------------------------------------------------------------
Send BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x0c QPN 0x008c PSN 0x5ccdd5
remote address: LID 0x05 QPN 0x010a PSN 0x178491
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
2 100000 0.000000 0.066552 4.159479
4 100000 0.00 0.11 3.529205
8 100000 0.00 0.27 4.225857
16 100000 0.00 0.54 4.254547
32 100000 0.00 1.09 4.254549
64 100000 0.00 2.19 4.276291
128 100000 0.00 4.51 4.408332
256 100000 0.00 9.21 4.498839
512 100000 0.00 18.60 4.540925
1024 100000 0.00 36.74 4.485289
2048 100000 0.00 75.76 4.623960
4096 100000 0.00 96.55 2.946372
8192 100000 0.00 96.57 1.473530
16384 100000 0.00 96.58 0.736823
32768 100000 0.00 96.58 0.368421
65536 100000 0.00 96.58 0.184218
131072 100000 0.00 96.58 0.092109
262144 100000 0.00 96.58 0.046055
524288 100000 0.00 96.58 0.023027
1048576 100000 0.00 96.58 0.011514
2097152 100000 0.00 96.58 0.005757
4194304 100000 0.00 96.58 0.002878
8388608 100000 0.00 96.58 0.001439
---------------------------------------------------------------------------------------
Ahhhh.....much better. That's more like it.