Tech in Wonderland

07 December 2024

The database that's behind one of my Photoprism instances took a dump.

This is for all the Linux fanbois out there and whilst this isn't actually, technically, the fault of Linux -- considering the fact that said Linux fanbois will bitch about software that runs on Windows as if that's a Windows problem, therefore; this post is going to be similarly in that vain.

So, I've got a bunch of separate Photoprism instances running, where in the background, in the same docker-compose.yml file, it uses the MariaDB as the database backend. (Which is like some form of MySQL using the InnoDB engine. I don't really know much about it, so I'll just leave it at that.)

Anyways, for some inexplicable reason, the database took a dump and Photoprim stopped being able to communicate with it.

Found out that the database itself was corrupt, and then tried restarting the mariadb part of the docker-compose.yml file with --innodb-force-recovery=2 option (because when I tried to run it with --innodb-force-recovery=1, it was still producing a fatal 11 error). I also had to add --skip-grant-table to the command that's starting the mariadb because without it, not even the root user, from inside the container, can administer the database, which is very strange because normally, a root user should be able to do anything and everything.

When I tried to run mariadb-check --all-databases, it says:

photoprism.photos

Warrning: InnoDB: Index `idx_photos_checked_at` contains 1832435 entries, should be 1832246.

error: corrupt

And then I tried to run mariadb-check -f --all-databases and it said:

(for all of the photoprism.* tables)

note: The storage engine for the table doesn't support repair.

Anyways, long story short -- the database back end took a dump, so I ended up deleting the whole thing and my system is in the process of re-indexing anything so that it can rebuild the database.

Linux sucks.

*edit*

Yes, I did try to backup the database, drop the database, re-create the database, and then importing everything from backup.

The new error that I got with that was:

[Warning] failed to load slave replication state from table mysql.gtid_slave_pos: 1932: Table `mysql.gtid_slave_pos` doesn't exist in engine

I tried to quickly google how I can re-create it, but the SQL statements seemed awfully specific to the specific person who was asking that question, and so, I assumed that that may NOT necessarily be how that table needs to be defined for use by Photoprism. So, I just ended up deleting the entire database, and let it re-build said database from scratch, all over again. We'll see how that goes. What a colossal PITA it is, to try and fix/repair said database though.

03 February 2023

Your BIOS is broken; bad RMRR

Welp.

This happened:

"Your BIOS is broken; bad RMRR"

Oh joys.

[/sarcasm]

31 January 2023

EVGA Nvidia GTX 980 passthrough in Proxmox does not work.

EVGA Nvidia GTX 980 passthrough in Proxmox does not work.

I've basically tried everything at this point so far and nothing has been able to passthrough my EVGA Nvidia GTX 980 SC w/ ACX 2.0 to a Windows 10 VM in Proxmox 7.3-3.

I figured that I would leave this nugget for people to find, should other people attempt the same in the future.

It didn't matter what settings I used for the /etc/pve/qemu-server/<<VMID>>.conf and/or /etc/default/grub.

Nothing worked. Save yourself a lot of time and skip trying.

I get the Windows Code Error 43 where it can tell that it is a GTX 980, but it can't start the driver for it.

22 October 2022

Paradoxic connundrum...

I bought a new Core i7-13700K and installed into my Asus Z690 Prime P that I bought about a year ago.

Question: How do you update the BIOS for a system that won't POST (so that it would be able to recognise said new processor)?

24 August 2022

It's amazing how power efficient new systems are compared to (much) older ones

Father-in-law, I think, was originally having some kind of problem with his old, old computer, and as a result, I ended up giving him my old Intel Core 2 Quad Q9550 system to him.

Recently, said Q9550 system started to have some issues, so I gave him my Intel NUC NUC7i3BNH (Intel Core i3-7100U (2-core, HTT enabled), originally 4 GB of RAM, but I upgraded that to 8 GB (2x 4 GB), and it also originally came with a 16 GB Intel Optane module and a 1 TB HGST 2.5" 7200 rpm HDD, but I swapped that out I think for an Intel 520 Series SSD). Anyways, but I digress.

I don't know if I ever took power measurements for the NUC (probably not), but let's instead, compare it for example to the Beelink GTR5 5900HX system, which, at idle, could be sipping somewhere between 9-maybe 16 W of power.

Compare and contrast that to the old Q9550 system which has 4x 2 GB G.Skill DDR2-800 RAM, and a Nvidia GTX 260 in it, with a 610 W PSU, and a single I think it's an Intel 525s Series 240 GB SSD in it. At idle, it sucking back somewhere between 120-160 W.

That's CRAZY!!!

I thought that I was going to re-purpose that system to be a server of some kind. But now, I'm not so sure.

Granted, the Q9550 system does have a Gigabyte EP45-UD3P motherboard in it, and as such, sports 8 SATA 3 Gbps ports. And the Intel Core 2 Quad Q9550 dose support Intel VT-x and Intel VT-d, which means that, again, in theory, I can run a few virtual machines on it and throw TrueNAS onto that system and make it into a storage server.

I don't know if I'm going to that for sure yet, but it is a potential option.

But man, that idle power is really making me re-think that plan. (Sadly, I'm not sure if newer servers would really be that much more efficient. Desktop systems and/or mini-PCs, yeah, but towers and/or servers - I don't know about that.)

The BitComet client has gone to shit.

https://www.cometforums.com/topic/12802501-bitcomet-causing-excessive-ping-times/page/2/

"If you are so unhappy using BitComet as your client (you have previously stated that you are using it at the same time as two other clients), I suggest that, at least, you show that you do possess some amount of common courtesy."

Read my initial posts.

I was merely and simply stating "hey, I think there's a problem here".

Rhubarb repeatedly denied that the problem even exists, let alone offer anything that resembles help and/or assistance.

Rhubarb's response is akin to how companies blame independent media outlets when said independent media outlets find issues with said company's products. (Which Steve from GamersNexus makes references to here:

The fact that I cited the old forum posts where Rhubarb even SPECIFICALLY and EXPLICITLY asks for ping time data whilst on here, argues that it's not about ping times is laughable at the very least.

Interesting how you make no mention of this fact in your reply.

If Rhubard is going to be belligerent, then you can't be surprised when said belligerence is going to be met with belligerence.

If Rhubard doesn't know how to help and/or doesn't want to help, then he could've just plainly and pointedly stated that.

But that wasn't the case here.

"I have been a resident of BitComet Forums, assuredly, a lot longer than yourself, and I am always amazed at how imperious some users like to sound and, rather than thanking those who donate their free time to attempt to aid others (not being remunerated, by the way), feel it is their God-given right to insult and try to belittle them, just because they do not see eye-to-eye with what is suggested. Thank Goodness that this is not the case of the vast majority of the more than 100,000 worldwide users of this free application!! "

Once again, if you have actually READ my posts and Rhubarb's responses, it LITERALLY reads:

Me: "Hey, I think there's a problem with the program."

Rhubarb: "No, there isn't."

Me: "Yes, there is. And here is the data to prove it."

Rhubarb: "No, there isn't."

Me: "Well, I googled it and this is how I found this forum because other people reported about the same thing."

Rhubarb: "If this was a problem, there'd be all sorts of 'me too' posts."

Me: "But there are 'me too' posts."

Rhubarb: "No, there isn't."

Me: "Yes, there are. Here are the quotations from those threads, and here are the links to those posts."

Rhubarb: "No, there isn't."

(see a pattern here?)

So, why would I thank someone who is in denial about a problem???

That makes no sense.

Would you ever thank an alcoholic that beats their wife and kids "thank you for beating me?" (because you're an alcoholic) That's absolutely ridiculous.

You can literally conduct your internal review of how Rhubarb could've handled this better cuz right now, he's at the same level as the Enermax issue.

"Would that decrease your infinite rage and please your Magnanimous self? "
Why would I need to do that?

Again, the other thread tells you that it's a problem with how the client tries to establish a connection to the DHT network.

It's a very simple question: on startup, what does the client attempt to do as it is trying to establish a connection to the DHT network?

It would appear that no one here has ever bothered to ask this very simple, basic question as it pertains to this issue which might be the responsible party for both, this thread, and the previous thread that was filed a year and 8 months ago.

"Thank them profusely for fixing my car and go on my merry way. There is a Spanish proverb that says that 'to be thankful is a sign of being well-bred' ("Ser agradecido es de ser bien nacido"). "
And that's the difference between your mechanic and Rhubarb.

Rhubarb never made it to trying to profusely fix the client.

That's the difference between your scenario and this one.

---

The response from their forum is an example of how not to handle a problem when users/clients are reporting a problem.

You can read the rest of the thread to see what I'm talking about there.

BitComet sucks.

Use something else instead.

*edit 2022-09-01*

BWAHAHAHAHA.....

The mods at the BitComet forum has now banned me from said forum because I reported an issue, and they refused to fix it.

LOL...LMAO....

Fuck BitComet. It's LITERAL trash.

22 August 2022

AMD Ryzen 9 5950X may NOT be as fast for CFD as I otherwise thought/hoped

So this test is based on the same testcase, but just testing it with two different CFD applications.

Both are steady-state solutions (which is normally used to initialise the flow field for the transient solution, which I am not testing at the moment).

The AMD Ryzen 9 5950X cluster is two nodes, where each node has an AMD Ryzen 9 5950X (16-cores, SMT disabled), 128 GB of DDR4-3200 unbuffered, non-ECC RAM, and a Mellanox ConnectX-4 MCX456A-ECAT 100 Gbps Infiniband network card whilst the Xeon cluster is two nodes, each with dual Intel Xeon E5-2690 (V1, 8-cores each, HTT disabled for both processors), 128 GB of DDR3-1866 2Rx4 Registered ECC RAM running at DDR3-1600 speeds.

In one of the applications, the AMD Ryzen 9 5950X finishes the solution in 23342.021 seconds whilst the Xeon pair of nodes finishes the same steady state solution in 15834.675 seconds (or about a 32.16% reduction in wall clock time), which is rather significant. This run has about 13.4 million cells and it takes this long because it is running for 1000 iterations.

And then in another, different CFD application, but also running the steady-state solution run for 48 iterations, and finishes the solution on the AMD system in 292.665 seconds whilst on the Xeon system, it finishes this solution in 264.48 seconds or about 9.63% faster.

That's really interesting that the AMD Ryzen 9 system, despite it being 8 and a half years newer, still isn't able to be as fast as an older Xeon-based cluster.

The only real upside to using the Ryzen-based system over the Xeon based system -- well, two things actually are:

1) The Ryzen based system uses quite a lot less power compared to the Xeon cluster. It isn't surprising that I can see power consumptions, under load, of upwards or around 1 kW for just running two nodes (and running all four nodes pushes that total up to somewhere between 1.6-1.9 kW) whereas the Ryzen based systems combined, is using probably only about maybe 400 W total.

2) The Ryzen based system is a LOT quieter than the Xeon Supermicro Twin Pro^2 server (6027TR-HTRF).

So, if you're running it in a home lab environment where you don't live by yourself, then despite it being slower, it might still be a better alternative for these two reasons.

And the Ryzen based solution is certainly cheaper than the Threadripper, Threadripper Pro, and/or AMD EPYC solution platforms, where you might be able to get some of that performance back, but I can't say for certain without actually testing it myself because I thought that having the 16 faster clock speed cores on the Ryzen 9 5950X would be faster than the Xeon E5-2690 platform. Based on the data and the results, I stand corrected.

I did not expect that.

15 June 2022

Engineering data consolidation efforts

Since I built my Ryzen 5950X system, and the 12900K system, and then had to completely disassemble the 12900K system, and then built another Ryzen 5950X system whilst arguing with Asus, I was in the middle of a data consolidation effort for all of my engineering data from the various projects that I've worked on over the years.

Today marks the day where the first pass of this data consolidation effort has completed and I ended up saving almost 14 TB of storage space.

It feels nice, and I get a sense of accomplishment as the data is being written to tape right now.

I can't believe that it's taken me like about 6 months to finish this data consolidation effort.

At some points during the process of unpacking, packing, and then re-packing the data, both Ryzen 5950Xs and also the Intel Core i7-4930K that's in the headnode was oversubscribed 3:1 when it was processing the data. That just seems pretty crazy to me because that's also a little bit of an indication as to how much work the CPUs had to do to process and re-process the data.

Not to mention, my poor, poor hard drives, that have been working so hard throughout all of this.

13 June 2022

Welp....this is a problem.

Let me begin with the problem statement:

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx5_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

TX depth : 128

CQ Moderation : 100

Mtu : 4096[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x0c QPN 0x008d PSN 0x277b7c

remote address: LID 0x05 QPN 0x010f PSN 0xda4554

---------------------------------------------------------------------------------------

#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]

2 100000 0.000000 0.064007 4.000452

4 100000 0.00 0.11 3.516592

8 100000 0.00 0.26 4.078050

16 100000 0.00 0.52 4.069701

32 100000 0.00 1.05 4.086223

64 100000 0.00 2.09 4.074705

128 100000 0.00 4.27 4.167070

256 100000 0.00 9.31 4.547246

512 100000 0.00 12.20 2.978638

1024 100000 0.00 13.17 1.607263

2048 100000 0.00 13.64 0.832231

4096 100000 0.00 13.82 0.421746

8192 100000 0.00 13.96 0.212971

16384 100000 0.00 14.08 0.107404

32768 100000 0.00 14.12 0.053869

65536 100000 0.00 14.17 0.027029

131072 100000 0.00 14.19 0.013528

262144 100000 0.00 14.17 0.006759

524288 100000 0.00 14.15 0.003375

1048576 100000 0.00 14.16 0.001688

2097152 100000 0.00 14.14 0.000843

4194304 100000 0.00 14.13 0.000421

8388608 100000 0.00 14.12 0.000210

---------------------------------------------------------------------------------------

What you see above is the results from the 100 Gbps Infiniband network bandwidth test that are between my two AMD Ryzen 5950X systems. Both of them has a discrete GPU in the primary PCIe slot, and then the Mellanox ConnectX-4 dual port, 100 Gbps Infiniband NIC is in the next available PCIe slot.

I can't really tell from the motherboard manual for the Asus ROG Strix X570-E Gaming WiFi II motherboard what speed the second PCIe slot is supposed to be when there is a discrete GPU plugged into the primary PCIe slot.

The Mellanox ConnectX-4 card is a PCIe 3.0 x16 card, which means that the slot itself is supposed to support upto 128 Gbps (and the ports themselves is supposed to go up to a maximum of 100 Gbps out of the 128 Gbps that's theorectically available). If the slots were running as PCIe 3.0 x4, it should be capable of 32 Gbps.

As the results show, clearly, that is not the case.

I'll have to see if I can run both of those systems without the discrete GPU, so that I can plug the Mellanox cards into the primary PCIe slot.

*Update 2022-06-14*:

So I took out the discrete GPUs from both systems and put the Mellanox card into the primary PCIe slot and this is what I get from the bandwidth test results:

---------------------------------------------------------------------------------------
                    Send BW Test
Dual-port       : OFF        Device         : mlx5_0
Number of qps   : 1        Transport type : IB
Connection type : RC        Using SRQ      : OFF
TX depth        : 128
CQ Moderation   : 100
Mtu             : 4096[B]
Link type       : IB
Max inline data : 0[B]
rdma_cm QPs    : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x0c QPN 0x008c PSN 0x5ccdd5
remote address: LID 0x05 QPN 0x010a PSN 0x178491
---------------------------------------------------------------------------------------
#bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
2          100000           0.000000            0.066552            4.159479
4          100000           0.00               0.11             3.529205
8          100000           0.00               0.27             4.225857
16         100000           0.00               0.54             4.254547
32         100000           0.00               1.09             4.254549
64         100000           0.00               2.19             4.276291
128        100000           0.00               4.51             4.408332
256        100000           0.00               9.21             4.498839
512        100000           0.00               18.60             4.540925
1024       100000           0.00               36.74             4.485289
2048       100000           0.00               75.76             4.623960
4096       100000           0.00               96.55             2.946372
8192       100000           0.00               96.57             1.473530
16384      100000           0.00               96.58             0.736823
32768      100000           0.00               96.58             0.368421
65536      100000           0.00               96.58             0.184218
131072     100000           0.00               96.58             0.092109
262144     100000           0.00               96.58             0.046055
524288     100000           0.00               96.58             0.023027
1048576    100000           0.00               96.58             0.011514
2097152    100000           0.00               96.58             0.005757
4194304    100000           0.00               96.58             0.002878
8388608    100000           0.00               96.58             0.001439
---------------------------------------------------------------------------------------

Ahhhh.....much better. That's more like it.

05 April 2022

Moral of the story: Do NOT buy from Asus. Intel is willing to offer a refund. Asus is not.

As a follow-up to my previous blog post about the data corruption issue that I was experiencing with the Intel Core i9-12900K processor that was running on the Asus Z690 Prime-P D4 motherboard, Intel has offered a full refund on the defective unit whilst Asus has not.

So, moral of the story:

Don't buy from Asus.

I mean, clearly, if the interaction between the Intel Core i9-12900K and the Asus Z690 Prime-P D4 motherboard is causing the system to spontaneously reset itself when I attempted to run memtest86 a second time, using the memory that was from my AMD Ryzen 9 5950X (which was also using an Asus motherboard), which PASSED memtest86 on said Ryzen platform, and by putting those four DIMMs into the Asus Z690 Prime-P D4 motherboard, it results in the system spontaneously resetting itself; that's NOT a good sign of a reliable motherboard.

Asus was ONLY willing offer a RMA repair, and I told them that the CPU is in the process of being sent back, so even if they attempted to repair it, I would have no way of verifying whether the issue is still there or not because the CPU would've already been sent back and I'm not buying another Alder Lake CPU from Intel only to give it the chance for this problem to repeat itself.

So, moral of the story:

Don't buy from Asus.