There’s something profoundly beautiful about how pretty much any problem in the world can be transformed into a mapping of X to Y, or ŷ, for that matter. Take art. Music. Nuclear fusion. Pick your problem. More intelligence is always better. The problem of evil and bad actors is something we’ll have to figure out but there’s also a lot of areas in life in which I think it’s not unreasonable to say that we tend to be quicker to throw rocks at other people than to question the premises behind our presumptions. So perhaps A.I. will be a fairer arbiter still…?

Regardless, it’s one area, one foundational technology in which you have to revisit your priors on a weekly basis and build forwards - build for a future that will be 10x, 1000x cheaper, faster, better. I cannot begin to make sense of it to be honest. But I do find that the best way to learn and apply A.I. from first principles is to do it but iterate quickly, connect it to what I already know, and try to explore unclaimed territories (hopefully).

So I got interested in this whole area around small, distributed models because getting access to GPUs is impossible - of course all the big actors and companies of the world has already CHOPED instances so that consumers are left without even a drip. I am interested in small models because even on a M2 Max 64GB unified memory, I get that message about running low on memory when trying to train a small Mixture of Experts model. Now that it’s finally done, I am wondering if I can dump more experts and data onto this toy and get it to performant levels for what I wanna do with Mandarin.

MLX MoE Inference on M2 Max

Small, Local Language Models

The problem of local language models really came to the fore for me looking at how terrible GPT-4 and even this toy model is on Mandarin. The good news there is that there are Chinese LLMs that I’ve tried and found to be usable - so I’m fairly confident that we can get a model that works well enough building on those efforts. I’ve been curious about how the situation plays out for other ASEAN languages cause of some translation and transcriptions work I was experimenting on for people, and there the gap to something of a usable quality just feels much further, and I can’t help but wonder what will happen to local cultures and languages in a world awash in A.I. Do we really want the world’s content, knowledge and information to be generated by a few large companies concentrated in a particular part of the world?

Any form of ban is impractical because anyone rejecting this general purpose tool will just need to ask yourself: imagine what your adversary, competition or enemy will do with this new tool. If that is out of the question, it becomes even more urgent to make sure that we give people the tools and access that they’ll need to succeed. We cannot talk about diversity and inclusion in A.I. without representativeness in datasets which is why I am constantly flabbergasted by people treating raw data which is almost as good as rough rocks at best, digital trash at worse, when such raw materials are locked away somewhere. We shelf things away, and then we wonder why the datasets and the models of the world are not good for our purposes. Then we hope/pray some benevolent corporate somewhere will bestow the elixir of A.I. to us on a silver platter. Sometimes woke can go too far, but we already have the technology of democracy to give people the freedom and the agency to co-exist with people whom we may not necessarily agree with, as long as we tolerate each other.

I’m interested in this also cause it’s now solvable, like low hanging fruit kind of solvable. We just need to get say 10% better on each of the relevant components from a more performant base model or like a merged model that is better, more data, further finetuning and we’ll be home free with something usable. I’m planning on trying things like adding more experts, scraping and generating synthetic data, to see if I can bake a model that can Mandarin, generate content and/or practice questions for the Chinese civil servant examination. That’ll be fun.

Tend to Your Garden

I was quite inspired by the idea of not abstracting too early when it comes to A.I. at this stage because the sand beneath our feet is shifting so fast. That was a damn solid reminder of actually why am I even letting myself spend more time than is necessary over this problem of serving multiple users concurrently and maintaining states over a stateless microservices architecture in WhatsApp; as in, that problem of serving multiple users concurrently is not the most critical point of focus, nor is it my area of specialty per se, at this stage.

I am much better off tending to model garden. But I’ll let myself take a stab at the aforementioned problem one last round this weekend and see how far we go, lol.

Originally published on PubPub at erniesg.pubpub.org/pub/4nrd812x.

A.I. for humans be like: it's just X -> ŷ

Table of Contents

Small, Local Language Models

Tend to Your Garden