Interesting. So that would mean that you would still need a 40 or 80 GB card to run the larger models (30B LLM, 70B LLM, 8x7B LLM) and perform training of them.
Or would it be possible to split the model layers between the cards like you can between RAM and VRAM? I suppose in that case each card would be able to evaluate the results of the layers in its own memory and then pass those results to the other card(s) as necessary.
Or would it be possible to split the model layers between the cards like you can between RAM and VRAM? I suppose in that case each card would be able to evaluate the results of the layers in its own memory and then pass those results to the other card(s) as necessary.