Quantcast
Channel: StarWind Software
Viewing all articles
Browse latest Browse all 785

StarWind Virtual SAN (VSAN) [+Free], HCI Appliance (HCA), Virtual HCI Appliance (VHCA) [+Free] • HCI Lab ng: All NVMe, 3x R740 with 10G and 100G

$
0
0
Good morning Gents!
Its been a while :) My last Starwind powered Cluster I created / ran is probably like >10 years in the past.
(Starwind v8 back in 2014, i was one of the beta testers)
I "played" a lot with MS S2D and VSAN from VMware and went down all (... okay a lot of...) rabbit holes.
Now I am at a point where I have the liberty to setup the "next generation" Cluster for my Home-Lab, which is very exciting for me.
I took me years to get my hand on the right gear and put together some powerfull selection of hardware to push the limits further a bit.
My old cluster (S2D) was based on 2x Dell R720s with a three tiered storage: NVMe, SSD and Spinning Rust. Synced over two 40g ConnectX-2 Cards.
That worked pretty well. It ran for over 5 years and I never encountered unexpected outages while enjoying blazing fast performance.
Since i am a techy at heart and doing the same thing over and over again is kinda boring: I am now searching for a "new" way to do things and learn while doing that.

For the new HCI-Cluster I have three identical R740 Servers, each with:
- 2x Xeon(R) Gold 6134
- 256 GB DDR4 Memory
- 2x Mellanox ConnectX-3 VPI: each with 2x25GBit -> Total of 4x 25GBit
- 1x Mellanox ConnectX-3: with a single 100G Port
- 3x Samsung PM983 NVMe 1.92 TB
- 1x Intel DC P4608 NVMe 6.4 TB (gets exported as 2x 3.2 TB to the OS)

My Network around the cluster is "quiet" redundant, but not fully redundant yet:
L2 and L3 is redundant (MLAG and VRRP) for 10G. (two Mikrotik CCR2004 and two CRS317)
100G on the other hand: I only have one MT CRS504 currently. So my 100G links are NOT redundant, yet.
(CRS520 are quiet expensive and NOT considered WAT: Wife Accepted Technology... yet) :lol: )

In my free time, over the last couple of weeks, I played with proxmox ve and ceph, and I learned a lot.
Its fun to play with and I`d like to push the solution further, but ceph isnt there yet.
EC-Pools are inefficient and there is A LOT OF OVERHEAD going on.
Mirror is faster, but also not AS FAST as i imagine things could be.
My Testing isn't very thorough yet, currently i am hitting the limits of this solution when writing with around 2600-2800 MB/s.
(Using Ethernet instead of RoCE.) Roce Support is terrible, from my perspective, with my current understanding.
It makes things slower, instead of faster. ¯\_(ツ)_/¯
I setup PFC and DCB, configured Lossless Ethernet on the CRS504 and made sure the traffic is in Class 3... but: still. "Horrible Performance". (Roce v1 and v2 are more or less performing equally bad in my lab)
My "gutt feeling" is that driver support for those mellanox cards could / should get better by a magnitude. Also current development of Ceph is focusing on improving ec-performance and efficiency... i think its very interesting, at least, what ceph will be able to do in maybe a year.

About my goals:
I`d like to be able to push the hardware in my home-lab as close to the edge (performance wise) as possible.
For the 5 nvme disks per node: Some sort of Parity instead of doing mirror would be really cool. (increase storage efficiency)
I`d love to use RoCEv2 for Storage-Sync.
It would be totally sufficient for my lab to run on two of the three servers, i only have three because the air gets very "thin" when it comes to solutions that work great with 2 nodes. So three is for the sake of testing and being able to evaluate different solutions. But: really not my end goal. (Power bill is already very very very *ouchy*)
I`d like to use the 100g Network as "first choice" for storage sync, but want to have it fail over to my 10g infra in case the 100g network goes down due to... well... SPOF.

Let me know what you think. I would be very interested to read what kind of setup you would recommend with your software products and what you would recommend to AVOID doing.

Have a great time!

Kind regards

Statistics: Posted by qwertz — Mon Dec 30, 2024 11:29 am



Viewing all articles
Browse latest Browse all 785

Trending Articles