Welp.
This happened:
"Your BIOS is broken; bad RMRR"
Oh joys.
[/sarcasm]
A blog about the intersection of engineering and information technology (IT)
EVGA Nvidia GTX 980 passthrough in Proxmox does not work.
I've basically tried everything at this point so far and nothing has been able to passthrough my EVGA Nvidia GTX 980 SC w/ ACX 2.0 to a Windows 10 VM in Proxmox 7.3-3.
I figured that I would leave this nugget for people to find, should other people attempt the same in the future.
It didn't matter what settings I used for the /etc/pve/qemu-server/<<VMID>>.conf and/or /etc/default/grub.
Nothing worked. Save yourself a lot of time and skip trying.
I get the Windows Code Error 43 where it can tell that it is a GTX 980, but it can't start the driver for it.
I bought a new Core i7-13700K and installed into my Asus Z690 Prime P that I bought about a year ago.
Question: How do you update the BIOS for a system that won't POST (so that it would be able to recognise said new processor)?
Father-in-law, I think, was originally having some kind of problem with his old, old computer, and as a result, I ended up giving him my old Intel Core 2 Quad Q9550 system to him.
Recently, said Q9550 system started to have some issues, so I gave him my Intel NUC NUC7i3BNH (Intel Core i3-7100U (2-core, HTT enabled), originally 4 GB of RAM, but I upgraded that to 8 GB (2x 4 GB), and it also originally came with a 16 GB Intel Optane module and a 1 TB HGST 2.5" 7200 rpm HDD, but I swapped that out I think for an Intel 520 Series SSD). Anyways, but I digress.
I don't know if I ever took power measurements for the NUC (probably not), but let's instead, compare it for example to the Beelink GTR5 5900HX system, which, at idle, could be sipping somewhere between 9-maybe 16 W of power.
Compare and contrast that to the old Q9550 system which has 4x 2 GB G.Skill DDR2-800 RAM, and a Nvidia GTX 260 in it, with a 610 W PSU, and a single I think it's an Intel 525s Series 240 GB SSD in it. At idle, it sucking back somewhere between 120-160 W.
That's CRAZY!!!
I thought that I was going to re-purpose that system to be a server of some kind. But now, I'm not so sure.
Granted, the Q9550 system does have a Gigabyte EP45-UD3P motherboard in it, and as such, sports 8 SATA 3 Gbps ports. And the Intel Core 2 Quad Q9550 dose support Intel VT-x and Intel VT-d, which means that, again, in theory, I can run a few virtual machines on it and throw TrueNAS onto that system and make it into a storage server.
I don't know if I'm going to that for sure yet, but it is a potential option.
But man, that idle power is really making me re-think that plan. (Sadly, I'm not sure if newer servers would really be that much more efficient. Desktop systems and/or mini-PCs, yeah, but towers and/or servers - I don't know about that.)
https://www.cometforums.com/topic/12802501-bitcomet-causing-excessive-ping-times/page/2/
"If you are so unhappy using BitComet as your client (you have previously stated that you are using it at the same time as two other clients), I suggest that, at least, you show that you do possess some amount of common courtesy."
Read my initial posts.
I was merely and simply stating "hey, I think there's a problem here".
Rhubarb repeatedly denied that the problem even exists, let alone offer anything that resembles help and/or assistance.
Rhubarb's response is akin to how companies blame independent media outlets when said independent media outlets find issues with said company's products. (Which Steve from GamersNexus makes references to here:
The fact that I cited the old forum posts where Rhubarb even SPECIFICALLY and EXPLICITLY asks for ping time data whilst on here, argues that it's not about ping times is laughable at the very least.
Interesting how you make no mention of this fact in your reply.
If Rhubard is going to be belligerent, then you can't be surprised when said belligerence is going to be met with belligerence.
If Rhubard doesn't know how to help and/or doesn't want to help, then he could've just plainly and pointedly stated that.
But that wasn't the case here.
"I have been a resident of BitComet Forums, assuredly, a lot longer than yourself, and I am always amazed at how imperious some users like to sound and, rather than thanking those who donate their free time to attempt to aid others (not being remunerated, by the way), feel it is their God-given right to insult and try to belittle them, just because they do not see eye-to-eye with what is suggested. Thank Goodness that this is not the case of the vast majority of the more than 100,000 worldwide users of this free application!! "
Once again, if you have actually READ my posts and Rhubarb's responses, it LITERALLY reads:
Me: "Hey, I think there's a problem with the program."
Rhubarb: "No, there isn't."
Me: "Yes, there is. And here is the data to prove it."
Rhubarb: "No, there isn't."
Me: "Well, I googled it and this is how I found this forum because other people reported about the same thing."
Rhubarb: "If this was a problem, there'd be all sorts of 'me too' posts."
Me: "But there are 'me too' posts."
Rhubarb: "No, there isn't."
Me: "Yes, there are. Here are the quotations from those threads, and here are the links to those posts."
Rhubarb: "No, there isn't."
(see a pattern here?)
So, why would I thank someone who is in denial about a problem???
That makes no sense.
Would you ever thank an alcoholic that beats their wife and kids "thank you for beating me?" (because you're an alcoholic) That's absolutely ridiculous.
You can literally conduct your internal review of how Rhubarb could've handled this better cuz right now, he's at the same level as the Enermax issue.
"Would that decrease your infinite rage and please your Magnanimous self? "
Why would I need to do that?
Again, the other thread tells you that it's a problem with how the client tries to establish a connection to the DHT network.
It's a very simple question: on startup, what does the client attempt to do as it is trying to establish a connection to the DHT network?
It would appear that no one here has ever bothered to ask this very simple, basic question as it pertains to this issue which might be the responsible party for both, this thread, and the previous thread that was filed a year and 8 months ago.
"Thank them profusely for fixing my car and go on my merry way. There is a Spanish proverb that says that 'to be thankful is a sign of being well-bred' ("Ser agradecido es de ser bien nacido"). "
And that's the difference between your mechanic and Rhubarb.
Rhubarb never made it to trying to profusely fix the client.
That's the difference between your scenario and this one.
---
The response from their forum is an example of how not to handle a problem when users/clients are reporting a problem.
You can read the rest of the thread to see what I'm talking about there.
BitComet sucks.
Use something else instead.
*edit 2022-09-01*
BWAHAHAHAHA.....
The mods at the BitComet forum has now banned me from said forum because I reported an issue, and they refused to fix it.
LOL...LMAO....
Fuck BitComet. It's LITERAL trash.
So this test is based on the same testcase, but just testing it with two different CFD applications.
Both are steady-state solutions (which is normally used to initialise the flow field for the transient solution, which I am not testing at the moment).
The AMD Ryzen 9 5950X cluster is two nodes, where each node has an AMD Ryzen 9 5950X (16-cores, SMT disabled), 128 GB of DDR4-3200 unbuffered, non-ECC RAM, and a Mellanox ConnectX-4 MCX456A-ECAT 100 Gbps Infiniband network card whilst the Xeon cluster is two nodes, each with dual Intel Xeon E5-2690 (V1, 8-cores each, HTT disabled for both processors), 128 GB of DDR3-1866 2Rx4 Registered ECC RAM running at DDR3-1600 speeds.
In one of the applications, the AMD Ryzen 9 5950X finishes the solution in 23342.021 seconds whilst the Xeon pair of nodes finishes the same steady state solution in 15834.675 seconds (or about a 32.16% reduction in wall clock time), which is rather significant. This run has about 13.4 million cells and it takes this long because it is running for 1000 iterations.
And then in another, different CFD application, but also running the steady-state solution run for 48 iterations, and finishes the solution on the AMD system in 292.665 seconds whilst on the Xeon system, it finishes this solution in 264.48 seconds or about 9.63% faster.
That's really interesting that the AMD Ryzen 9 system, despite it being 8 and a half years newer, still isn't able to be as fast as an older Xeon-based cluster.
The only real upside to using the Ryzen-based system over the Xeon based system -- well, two things actually are:
1) The Ryzen based system uses quite a lot less power compared to the Xeon cluster. It isn't surprising that I can see power consumptions, under load, of upwards or around 1 kW for just running two nodes (and running all four nodes pushes that total up to somewhere between 1.6-1.9 kW) whereas the Ryzen based systems combined, is using probably only about maybe 400 W total.
2) The Ryzen based system is a LOT quieter than the Xeon Supermicro Twin Pro^2 server (6027TR-HTRF).
So, if you're running it in a home lab environment where you don't live by yourself, then despite it being slower, it might still be a better alternative for these two reasons.
And the Ryzen based solution is certainly cheaper than the Threadripper, Threadripper Pro, and/or AMD EPYC solution platforms, where you might be able to get some of that performance back, but I can't say for certain without actually testing it myself because I thought that having the 16 faster clock speed cores on the Ryzen 9 5950X would be faster than the Xeon E5-2690 platform. Based on the data and the results, I stand corrected.
I did not expect that.
Since I built my Ryzen 5950X system, and the 12900K system, and then had to completely disassemble the 12900K system, and then built another Ryzen 5950X system whilst arguing with Asus, I was in the middle of a data consolidation effort for all of my engineering data from the various projects that I've worked on over the years.
Today marks the day where the first pass of this data consolidation effort has completed and I ended up saving almost 14 TB of storage space.
It feels nice, and I get a sense of accomplishment as the data is being written to tape right now.
I can't believe that it's taken me like about 6 months to finish this data consolidation effort.
At some points during the process of unpacking, packing, and then re-packing the data, both Ryzen 5950Xs and also the Intel Core i7-4930K that's in the headnode was oversubscribed 3:1 when it was processing the data. That just seems pretty crazy to me because that's also a little bit of an indication as to how much work the CPUs had to do to process and re-process the data.
Not to mention, my poor, poor hard drives, that have been working so hard throughout all of this.
Let me begin with the problem statement:
What you see above is the results from the 100 Gbps Infiniband network bandwidth test that are between my two AMD Ryzen 5950X systems. Both of them has a discrete GPU in the primary PCIe slot, and then the Mellanox ConnectX-4 dual port, 100 Gbps Infiniband NIC is in the next available PCIe slot.
I can't really tell from the motherboard manual for the Asus ROG Strix X570-E Gaming WiFi II motherboard what speed the second PCIe slot is supposed to be when there is a discrete GPU plugged into the primary PCIe slot.
The Mellanox ConnectX-4 card is a PCIe 3.0 x16 card, which means that the slot itself is supposed to support upto 128 Gbps (and the ports themselves is supposed to go up to a maximum of 100 Gbps out of the 128 Gbps that's theorectically available). If the slots were running as PCIe 3.0 x4, it should be capable of 32 Gbps.
As the results show, clearly, that is not the case.
I'll have to see if I can run both of those systems without the discrete GPU, so that I can plug the Mellanox cards into the primary PCIe slot.
*Update 2022-06-14*:
So I took out the discrete GPUs from both systems and put the Mellanox card into the primary PCIe slot and this is what I get from the bandwidth test results:
---------------------------------------------------------------------------------------
Send BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x0c QPN 0x008c PSN 0x5ccdd5
remote address: LID 0x05 QPN 0x010a PSN 0x178491
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
2 100000 0.000000 0.066552 4.159479
4 100000 0.00 0.11 3.529205
8 100000 0.00 0.27 4.225857
16 100000 0.00 0.54 4.254547
32 100000 0.00 1.09 4.254549
64 100000 0.00 2.19 4.276291
128 100000 0.00 4.51 4.408332
256 100000 0.00 9.21 4.498839
512 100000 0.00 18.60 4.540925
1024 100000 0.00 36.74 4.485289
2048 100000 0.00 75.76 4.623960
4096 100000 0.00 96.55 2.946372
8192 100000 0.00 96.57 1.473530
16384 100000 0.00 96.58 0.736823
32768 100000 0.00 96.58 0.368421
65536 100000 0.00 96.58 0.184218
131072 100000 0.00 96.58 0.092109
262144 100000 0.00 96.58 0.046055
524288 100000 0.00 96.58 0.023027
1048576 100000 0.00 96.58 0.011514
2097152 100000 0.00 96.58 0.005757
4194304 100000 0.00 96.58 0.002878
8388608 100000 0.00 96.58 0.001439
---------------------------------------------------------------------------------------
Ahhhh.....much better. That's more like it.
As a follow-up to my previous blog post about the data corruption issue that I was experiencing with the Intel Core i9-12900K processor that was running on the Asus Z690 Prime-P D4 motherboard, Intel has offered a full refund on the defective unit whilst Asus has not.
So, moral of the story:
Don't buy from Asus.
I mean, clearly, if the interaction between the Intel Core i9-12900K and the Asus Z690 Prime-P D4 motherboard is causing the system to spontaneously reset itself when I attempted to run memtest86 a second time, using the memory that was from my AMD Ryzen 9 5950X (which was also using an Asus motherboard), which PASSED memtest86 on said Ryzen platform, and by putting those four DIMMs into the Asus Z690 Prime-P D4 motherboard, it results in the system spontaneously resetting itself; that's NOT a good sign of a reliable motherboard.
Asus was ONLY willing offer a RMA repair, and I told them that the CPU is in the process of being sent back, so even if they attempted to repair it, I would have no way of verifying whether the issue is still there or not because the CPU would've already been sent back and I'm not buying another Alder Lake CPU from Intel only to give it the chance for this problem to repeat itself.
So, moral of the story:
Don't buy from Asus.
Welp, this happened: