Rendered at 17:21:28 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
rafaquintanilha 2 days ago [-]
I have no affiliation with them but here's what I think happened:
1. They claim the official model is based on Qwen 397B. It's likely they didn't disclose Nex Pro at all because Nex itself is based on the same base model (not saying they shouldn't).
2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.
3. It's important to notice they didn't advertise the model besides posting it on Reddit 2 days ago. It became viral organically, over the weekend, and during Brazil's World Cup debut (Brazilians will understand). Of course the mayor of Rio took the opportunity to capitalize over the free coverage, but that wasn't done in conjunction with the researchers.
4. I don't see why they would disclose Qwen 397B as base and mention the SwiReasoning paper but not mention Nex if all they did was to merge both models.
5. In any case, what they are claiming is easily verifiable once (if) they upload the right model.
This should be at the top: they uploaded the wrong model, they fixed it
jwitthuhn 1 days ago [-]
They did upload the wrong model but as of the time of writing they have not fixed it. Right now, 12 hours after they took the old one down, there is simply no model present in their huggingface repo.
xiphias2 1 days ago [-]
I guess they will upload it later, it seems like an honest mistake to me.
Anyways SwiTransformer paper looks interesting and doing a post training to optimize for it looks interesting as well.
I'm honestly impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is probably the last headline I ever expected to read on HN.
airstrike 2 days ago [-]
Worth reminding everyone that Lua was also created in Rio, though admittedly at PUC rather than by the government.
Rio has a strong engineering talent pool, along with many other major capitals in Brazil
matheusmoreira 2 days ago [-]
Brazil does have talent. Mauro Carvalho Chehab is a Linux kernel maintainer. Elixir was created by José Valim, a brazilian. I have also created my own programming language.
What Brazil doesn't have is a history of properly rewarding talent, which often causes it to migrate elsewhere. So it's definitely surprising when any sort of technological development happens in Brazil: it implies someone who stayed managed to get something done, most likely for much less than what that something is actually worth, while also being crushed by extremely high taxes that essentially doubles the cost of computer hardware.
red-iron-pine 1 days ago [-]
> extremely high taxes that essentially doubles the cost of computer hardware.
I think people are missing the last few words -- cost of computing hardware
when I used to do ISP work I did a lot for LATAM. The joke was that you'd get better bandwidth for Brazil routing out of the country and through Miami than going across the country. The reason? crazy high tariffs on hardware.
No reason to base anything locally, and if you're not basing it locally then there isn't really much reason to stick around, either. Go to other hot markets like Zona America, Austin, CDMX, Miami, Los Angeles, etc. and make the big $$$.
I worked with 2 Brazilian engineers who were in country (and currently work with a 3rd now, based in Monteal) and they were very good but all said they had to get out of country to lock in the serious engineering roles.
rbanffy 1 days ago [-]
> extremely high taxes
I always find this funny. Brazilian taxes are nowhere near what I would say “high”. I pay about twice as much out of my compensation as I would pay in Brazil, and that would be as if I did zero tax optimisation back then.
fabioz 1 days ago [-]
I can second this.
Compared to many countries Brazil doesn't have such high taxes (I'd say that if you work remotely for a company outside of Brazil, you'll probably have much lower taxes compared to almost any other country -- working locally the difference isn't as big, but you have higher taxes in many other places).
What it really lacks is access to capital (which is the real "mojo" of the US compared to the rest of the world).
iterateoften 1 days ago [-]
Also the bureaucracy, employee rights, etc.
Incorporating and getting a functional business entity in Brazil is harder. In USA I literally do in 5min online including bank account. In Brazil they are taking out microscopes to verify your signature on the paperwork matches.
And in the USA if you have one bad employee, just fire them any time. In Brazil for better or for worse nowhere near as easy. Obviously better for employees but businesses don’t like it because you can get stuck with a employee dragging down everyone unless you pay them a years salary etc.
persedes 1 days ago [-]
Parent was referring to the cost of hardware. I've had colleagues from brazil visit the US and go absolutely crazy at best buy to grab as much hardware as they could (laptops, nintendo switch, etc), because it's prohibitively expensive for them to buy that at home.
dlisboa 1 days ago [-]
It's not "prohibitively" expensive if they managed to pay for flights to the US, hotels and then buy several pieces of hardware. They definitely had more than enough money to buy whatever they wanted in Brazil. It is, however, much more expensive than it should be, which leaves a bad taste in our mouths and means people will find other ways of acquiring some items.
persedes 1 days ago [-]
It was a business trip, but yeah.
matheusmoreira 22 hours ago [-]
Yeah. Brazilians go nuts when they see US prices. Every family member who travels to the US is showered with purchase requests. Anything they manage to bring back without being taxed essentially receives a 50% discount.
rglullis 1 days ago [-]
As an employee: your taxes are not that high, but public services are terrible so most of middle-class ends up paying for the private alternative as well.
As a business owner: not so bad if you are a freelancing or just a few business partners providing some type of service, but terrible the moment you start considering employing other people.
rbanffy 1 days ago [-]
> but public services are terrible
Have you seen the public services of countries with lower taxes? Their public hospitals?
> but terrible the moment you start considering employing other people.
Employing people isn't cheap anywhere (except, perhaps, in the US, where labour rights are kind of nonexistent)
rglullis 1 days ago [-]
I live in Germany. No such thing as public hospitals. And I pay close to 1200€/month in health insurance to the public insurance company.
I quick visit to the dermatologist to check for some tiny bumps that showed up in my forehead: 60€, out of pocket, because the insurer doesn't cover it.
rbanffy 1 days ago [-]
Sad to hear about that. Ireland is much better in that regard - you can pay for private healthcare and it'll provide you a broader network, but you might as well go for public health, where you'll be prioritized based on how life-threatening is your condition.
rglullis 1 days ago [-]
Yeah, I make it sound worse than it seems. The problem of the public insurance is that you pay based on your revenue instead of your actuarial risk, so in the end it should be treated as an extra form of revenue tax. I could go for the private insurance if I wanted to pay less, but then I'd have to switch my kids to the private insurer as well.
All in all, my point was only that the amount of taxes that people pay and quality of services are not necessarily related. Germany has high taxes and expensive-but-adequate healthcare. Greece has high taxes and expensive-and-inadequate healthcare. Switzerland has low taxes and universal/cheap healthcare (max. $5000/year deductible, max charge per hospitalization of $700).
rbanffy 8 hours ago [-]
> you pay based on your revenue instead of your actuarial risk
That's how public health works. It's the same as mortgage insurance in Brazil (where I come from), which is mandatory, and, since it's mandatory, it doesn't consider actuarial risk.
rglullis 6 hours ago [-]
Doesn't make it right or morally virtuous.
throw-the-towel 1 days ago [-]
Wow, 60 euro is cheap! Here in France it would be more like 150.
matheusmoreira 23 hours ago [-]
Import taxes in Brazil are 60%, plus something like 18% on top of the product, shipping and the aforementioned import taxes.
The result is a nearly 100% tax on computers and consumer electronics.
One for you, one for the government.
And it's getting worse. Tariffs on computer hardware were raised only a few months ago.
drdexebtjl 10 hours ago [-]
60% (II) with 18% (ICMS) on top for a total of +88% are the import tariffs for individuals buying devices for personal use, and small businesses using simplified postal/courier regimes.
The tariffs for commercial importations are much lower and depend on the part. For SSDs, for example, II is around 10%. With other fees and ICMS, you're looking at around +60% total. Still high, but not nearly as high.
But large businesses would rather really prefer if you continued to believe they pay +88% just like you. That way they get to point at the government while keeping their fat margins.
rbanffy 8 hours ago [-]
> But large businesses would rather really prefer if you continued to believe they pay +88% just like you. That way they get to point at the government while keeping their fat margins.
And, in the meantime, they help push for more "grift-friendly" politicians. For them, it's a win-win situation.
rbanffy 8 hours ago [-]
Doesn't Dell, HP, and a number of others have manufacturing in Brazil under better tax regimes for the parts? I remember one of the points of the Zona Franca de Manaus was that - build a factory and enjoy tax breaks mostly for your imports.
Apart from that, this is something that affects the HN crowd and almost nobody else.
jdahlin 1 days ago [-]
Brazil has the opposite of high taxes, especially for company owners. I remember paying 6% on income, compared to up to 70% in Sweden.
matheusmoreira 22 hours ago [-]
Import taxes in Brazil are 60%, plus something like 18% on top of the product, shipping and the aforementioned import taxes.
The result is a nearly 100% tax on computers and consumer electronics. One for you, one for the government.
That 6% figure is just the Simples Nacional rate for micro-businesses making less than 35kUSD/year. The actual income tax tops out at 27.5% at middle class thresholds. On top of that Brazil stacks social security tax, payroll taxes and a yet more taxes embedded in every single purchase. If you calculate all of this you can figure out something like up to 70% of a brazilian's income can flow to the government.
You say swedish companies pay 70% taxes. Well, swedish citizens get excellent services and a generally functioning country in return. Brazilian citizens pay 70% taxes and they get... Brazil.
drdexebtjl 16 hours ago [-]
This is very misleading. My salary in Brazil is on the very top end (with most of my income in the 27.5% bracket), and my average effective income tax rate in the last 5 years has been about 16%.
I'm not doing anything creative accounting-wise, I just max out my contributions to retirement accounts (PGBL) and get the correct tax deductions for all medical and education expenses.
We do have high import tariffs for individuals, and especially for consumer goods, as it's been pointed out in a different comment.
This does make it a very expensive country indeed if you want to live your life worshiping consumerism. But if you don't, you'll find that individuals don't really pay that much compared to other countries.
matheusmoreira 15 hours ago [-]
> This is very misleading.
It's your comment that's misleading. I was trying to account for the numberless taxes that exist and get applied to every single transaction. You zeroed in on income taxes then stacked some deductions on top.
> tax deductions
Discounting deductions from the nominal tax rate doesn't change the fact those taxes are high, nor does it change the fact you max out your tax bracket at middle class incomes.
Deductions are actually the bare minimum. If you're using them, it means the state failed to provide you with proper education and health services, forcing you to spend money on things that are theoretically your constitutional rights. Not deducting these expenses would be robbery. The fact most brazilians have plenty of deductions at their disposal is only evidence of how absurdly tax inefficient this country is.
These deductions aren't automatic either, you have to spend time and effort accounting for all of this so that you can make the government give back some of the money it took from you. Time is money, so this is just yet another stealthy tax.
Finally, other countries no doubt have deductions too. I know for a fact that the US does, and european countries almost certainly do too. Accounting for these will probably only make Brazil look even worse by comparison.
> This does make it a very expensive country indeed if you want to live your life worshiping consumerism.
What a dismissive comment.
US government just banned Fable for foreign peasants like us. If you want a computer that can properly run LLMs locally, you're going to be forced to shell out money in the 40-100kBRL range. Computers are in the same price range as cars now.
If you think having some degree of sovereignty over our computing is "worshipping consumerism", then I don't know what to say to you.
Europe is currently fighting tooth and nail to develop some technological independence. China is creating Manhattan projects to catch up to the west in semiconductor manufacturing and kick them out of their supply chains. If we keep up these nonsense taxes, AI will be just yet another area where Brazil is half a century behind.
Brazil taxes foreign products in order to "protect local industry", then it taxes the local industry as well, which means pretty much nothing higher up in the value chain gets made here. Brazilian efforts at creating national computer technology date back to the military dictatorship, to the import substitution policies. The same time period that birthed Lua, in fact. What have we been doing since then? Nothing. Don't have our own industries, and we can't really buy the products produced by other nations either. This is why people leave: Brazil combines the worst of both worlds.
drdexebtjl 11 hours ago [-]
> You zeroed in on income taxes then stacked some deductions on top.
You're the one that brought up a comically inflated 70% number as if it were realistic. You can't act as if the nominal rate is the effective rate, then complain when I bring up numbers based on the effective rate.
> If you're using them, it means the state failed to provide you with proper education and health services, forcing you to spend money on things that are theoretically your constitutional rights.
No, it means I'm picky about my doctors. You seem to have ignored the tax-advantaged retirements accounts, though.
> These deductions aren't automatic either, you have to spend time and effort accounting for all of this so that you can make the government give back some of the money it took from you. Time is money, so this is just yet another stealthy tax.
You just need to ask for receipts and put them in a (digital) folder. Then you spend 5 minutes tops _per *year*_ reporting their sums on your tax forms. If that's not enough, most of the numbers are pre-filled for you, you just have to review it. And you can download past receipts from the federal government's website.
> I know for a fact that the US does, and european countries almost certainly do too. Accounting for these will probably only make Brazil look even worse by comparison.
Then do it. Tax legislation is very different across countries and even municipalities. Comparing nominal tax rates is completely meaningless. You need to compare the effective tax rate.
> If you want a computer that can properly run LLMs locally, you're going to be forced to shell out money in the 40-100kBRL range. Computers are in the same price range as cars now.
What part of that is due to an increase in taxes? Hardware prices have skyrocketed around the world due to limited supply. In fact, there's a record high number of computer hardware parts in the most recent list of products exempt of import taxes.
> If we keep up these nonsense taxes, AI will be just yet another area where Brazil is half a century behind.
Our government is doing exactly that. The latest project in discussion in the Senate will give import tax exemptions and export tax exemptions to data center projects that reserve 10% capacity to the national market, invest 2% locally in R&D, and use clean energy. I think these numbers are ridiculously small.
If we had lower import taxes on data center hardware, how else would the government negotiate with data center companies to reserve capacity for our national interests?
Finally, I think it's a bit silly to think that _you and me_ running agentic coding LLMs at home furthers national interests. It does not. It furthers our hobbies. It's not even the kind of hobby that gives you relevant career experience which then goes on to strengthen our industry.
> The same time period that birthed Lua, in fact.
Lua was created in 1993 in a lab doing research for Petrobrás. I happened to graduate from PUC-Rio, so I know this personally: the Computer Science labs are receiving much more funding nowadays than they did in 1993. They're still cranking out excellent research, and, if I may say so myself, excellent alumni as well.
> What have we been doing since then? Nothing.
- Our electronic voting system;
- Pix, the largest and most popular payment network in the world;
- Elixir, LangFlow, Neovim, just to name a few that you probably know about.
2 days ago [-]
mathattack 2 days ago [-]
Yes. Though even more than the US, their engineering talent from top schools heads into consulting and finance.
cscheid 2 days ago [-]
Yes! That "prefeitura do Rio" huggingface URL is definitely shocking to read to this Brazilian as well (I'm assuming you and parent also are from your usernames).
Aurornis 2 days ago [-]
> 2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.
They merged the base model with another lab’s fine tuned model. The improvements could have come from getting some of the fine tuned weights from the other model.
If they really had a better performing model that they “accidentally” forgot to upload, they could have uploaded the correct file by now.
I only see an edit to the readme (13h ago) and removal of the weights, so the repo is now empty.
I am willing to give them the benefit of the doubt, but we've seen this before: a model gets released that is supposedly state-of-the-art, yet seems to be a an other repackaged model without any training. Reflection 70B was the most similar example, all they now need is an api that rewrites "Claude" to "Rio".
motbus3 1 days ago [-]
It seems to me this is clearly a mistake. They would not even have the resources for it as far as I know and I think they are not even on a position to such bold claims.
matheusmoreira 14 hours ago [-]
Brazil could easily do it. Fine tuning requires some number of H100 cards. Trivial for the brazilian government. Existing brazilian labs are nothing compared to US hyperscalers but they do have enough capacity to fine tune Qwen. Santos Dumont has 248 H100s + 144 Grace Hoppers.
That's what makes this hilariously sad. Brazil could have done some good work here, but it just didn't. Brazil merged two models on a workstation.
smus 2 days ago [-]
What do you mean World Cup debut? haven't they won 5?
alxndresp 2 days ago [-]
They meant their first, opening game of this current World Cup tournament
s1artibartfast 2 days ago [-]
My understanding is that they didnt do any distalation. Tevery weight is a 60/40 element wise average of QWEN and NEX. Is this possible if the rio contracter did thei own post-training as claimed?
> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen — across all 60 layers and every component of the network. Other finetunes cannot be explained as interpolations.
I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.
Aurornis 2 days ago [-]
> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.
Enhanced it on a couple benchmarks, supposedly.
The game is to turn knobs until you get a benchmark run that shows an improvement, then ship it. There are a lot of fine tunes and chimera models on HuggingFace that are supposedly better at some specific test, but when you use them for anything else they're usually worse.
This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases.
andai 2 days ago [-]
They seem to have deleted most of the README now, but the archived version has benchmarks.
Rio seems to be about halfway between Qwen 3.5 and Nex, as you'd expect?
monster_truck 2 days ago [-]
I don't think your last point is correct. Ablation, when done correctly, seems to increase the quality and typically also the performance too.
Aurornis 2 days ago [-]
Abliterarion is a brute force technique that removes or silences parts of the model. It reduces performance because the abliterated elements aren’t perfectly isolated to censorship so other aspects suffer.
Many of the “uncensored” model providers also do some fine tuning on the models. Some of them target better benchmarks or other measures, but outside of the benchmarks and metrics they’re fine tuned for they are generally noticeably worse than the original model.
yowlingcat 2 days ago [-]
The kind of abliteration you are mentioning is no longer state of the art or the most common form of removing the refusal layer in most models. Your your understanding was up to date about a year and a half ago, but has been out of date since after that.
weitendorf 2 days ago [-]
Unrelated but I’ve been putting off learning about post-abliteration technique and want to use it for an upcoming open source “retraining” project I have on my backlog. I’m not interested in the refusal layers though, more like deep fine tuning but in a way that might let me prune out or consolidate layers, if that makes sense? Do you have any pointers or links to the current SOTA in this area?
I guess I’m looking for a kind of bulk/sticky dropout (which was in fashion way back when I studied DNN in school).
avadodin 1 days ago [-]
What OP is describing wasn't called abliteration at all.
Abliteration whilst a neologism implies a surgical ablation of refusal.
Earlier approaches post–trained the model to refuse less and, much like other kinds of fine–tuning, it degraded performance. They were "uncensored".
Abliteration has seen some improvement to this day but it always was close to equivalent performance to the original when compared to those earlier techniques.
ls612 2 days ago [-]
Nowadays it is that Heretic tool is it not? I’ve seen Gemma models uncensored with it.
tredre3 2 days ago [-]
That is something often claimed by heretics. My experience couldn't diverge more, however. All heretic (and abliterix) models I've tried are worse than the original. It's not immediately obvious if all you do is ask 2-3 questions and marvel at how it didn't refuse, but try using them for real over longer 8k+ contexts and it falls apart real fast.
They're more prone to getting stuck in loops, becoming unresponsive, and hallucinating more (presumably because of the reduced desire to not answer).
I've tried all the popular heretic peddlers, but if you have one that you can vouch for maybe I've simply missed it.
antonvs 1 days ago [-]
I'm curious about where you got that idea from. Neither the theory nor the available examples support it. If it did, everyone knowledgeable would be using abliterated models.
manquer 2 days ago [-]
> game is to turn knobs until you get a benchmark run that shows an improvement, then ship it
i.e reinforcement learning against a weak reward function - benchmark is insufficiently complex and is not representative of the real world sufficiently.
The "game", i.e. decision tree can be modeled as a multi-arm bandit problem, to deploy finite resources ( compute) toward exploitation/exploration .
The main issue is each training / fine-tune is very expensive so number of chances at the slot so to speak is pretty limited today.
I don't believe this would work on two LLMs that have different pretraining. Even if it did you would need two LLMs that have exact same internal activation shapes, dimensions, expert counts, token vocabulary, realistically it would never happen outside of finetunes or academic experiments.
hashmap 2 days ago [-]
not this exact thing, no, because the functional circuits dont appear in the same places across models. but if you find where they are you can do something like branch between some of the middle functional circuits between models and it kinda just works, or even do one after the other. you cant just like swap any two layers cause a bunch of em bend hyperbolic curvature to do hierarchical stuff deep in the poincare ball and the geometries get all bonkers, but before and after they do that things are relatively flat, and the geometries are more or less transferrable up to rigid rotation if they're each trained on large enough data.
oofbey 2 days ago [-]
Correct. We used to think that because NN optimization is non-convex there are all these local minima. Now we know that once you get past the very early parts of training from random init, the loss surface is fairly smooth, and not really convex, but close enough in a bunch of ways - linear combinations of trained models are pretty much always valid combinations. You can think of fine tunings as deltas on the original model which can be summed together successfully. I think this paper first showed that to me: https://arxiv.org/pdf/1802.10026 which was 8 years ago now.
woadwarrior01 2 days ago [-]
It's is a well known idea[1], although it's still surprising that something as simple, even works.
This team could have stopped here and still had something interesting (albeit not novel) to show. But the hype cycle was too tempting.
itkovian_ 2 days ago [-]
This is called linear mode connectivity and seems to work for almost every large model. So well that in most cases it’s an explicit part of the training process; do many training ‘branches’ then merge then continue.
It is not understood why it works so well.
teravor 1 days ago [-]
is that actually how they train them in the datacenter? the trillion sized weight vector gets cloned and sent off to groups of GPUs and averaged after?
2 days ago [-]
tarruda 2 days ago [-]
What I find fascinating is the idea that there might be a set of "secret" tweaks that when applied to those weights (or even smaller models) could result in an intelligence simulation that could vastly surpass even something like Fable.
ok I guess they had other clues then if you do any sort of comparison vs Nex & Qwen probably a lot of weird coincidences will show up if somehow the three weights are not linearly independent lol
themafia 2 days ago [-]
> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.
Which could be a signal that your "performance" was so abysmal in the first place that even randomly applied training methods can't make it _worse_.
randall 2 days ago [-]
[dead]
meindnoch 2 days ago [-]
It shows that LLMs are an extremely wasteful approach to intelligence.
kristjansson 2 days ago [-]
or that intelligence is merely the composition of many redundant, lossy, ~random components
antonvs 2 days ago [-]
Compared to what?
2 days ago [-]
unrvl22 2 days ago [-]
The municipality of Rio de Janeiro (via its IT company IplanRIO) released Rio-3.5-Open-397B, presented as a homegrown Qwen3.5 fine-tune that beats comparable open models on benchmarks. The linked issue argues it's actually a weighted merge of ~60% Nex-N2 Pro + ~40% Qwen3.5-397B-A17B - Nex-N2 having been released about a week earlier.
DonsDiscountGas 2 days ago [-]
I didn't know model merging like that was possible. (Obviously possible from a pure software standpoint but I'm surprised it's effective)
it works because Nex N2 is also a derivative of the original base Qwen model. If it was two completely unrelated models it wouldn't work.
hypercube33 2 days ago [-]
Even merging models with themselves as shown here in the post how they got to the top of hugging face with two gpus
baobabKoodaa 1 days ago [-]
A few years back these used to be called "Frankenstein models"
Lucasoato 2 days ago [-]
So the problem isn’t in the missing attribution to Qwen, but with the fact that they didn’t mention Nex-N2 Pro right?
Aurornis 2 days ago [-]
The problem is that they claimed to have made a big achievement with their home grown post training, and they expected to receive a lot of praise for it.
Then researchers looked at the weights and there is no post training at all.
They are now attributing both models they merged, but their excuse for the lack of post training is to claim they accidentally uploaded the wrong files.
serial_dev 2 days ago [-]
I’d believe they accidentally uploaded the wrong files if they uploaded the correct ones. To state that they accidentally uploaded something else and then not upload the correct version means they probably do not have anything and either hope people forget about this or they are scrambling to have something that is at least close to their original claim.
evilduck 2 days ago [-]
"Oops, we uploaded the wrong files" is the standard deflection every time people like this get caught.
Look up "Reflection 70B" drama.
2 days ago [-]
clear-octopus 2 days ago [-]
[dead]
vasco 1 days ago [-]
Rio better have the best IT infrastructure and software in the world if they are spending time on LLMs. What a waste of tax payer money.
vitorgrs 1 days ago [-]
Piaui state it's also doing a LLM it seems. But indeed it would make more sense if it was a national thing rather than local...
zinodaur 2 days ago [-]
Oh no, someone is profiting off of their work without proper attribution!?!?
Aurornis 2 days ago [-]
This is an open weights model based on other open weights models.
The dispute is that they released it with claims about having done some post training that improved the outputs. It was discovered that the model was not post trained like they claimed.
The HF page now says it’s a merge of models, which wasn’t there before. They’re trying to claim they accidentally uploaded the wrong model to HF and that they’ll upload the real one soon.
Basically, they thought they could splice two open weights models together and claim their team had accomplished some amazing post training, but they weren’t smart enough to realize that other researchers would discover that there wasn’t any post training.
moritzwarhier 2 days ago [-]
Thanks for the factual clarification. This is so important when everyone already has their trigger finger on politics. Not meaning that politics are irrelevant here, see sister comment by jobim.
But it's impossible to form a nuanced opinion when political association has a higher priority than the facts; which, again, don't look flattering for the implementers.
iknowstuff 2 days ago [-]
How do they just splice two models together?
Aurornis 2 days ago [-]
The Nex N2 model they merged is based on Qwen 3.5, so you can swap pieces of one into the other. They found a combination of the two that did well on some benchmarks and shipped it.
In the early days of Llama there were a lot of experiments like this. There were even some interesting combinations of models where they stacked layers of different models together or even added more layers with interesting results.
But announcing that you spliced two models together isn't very impressive in 2026, so they announced that they had done their own post training and outdid the big labs. They thought nobody would look close enough to notice.
ninja3925 2 days ago [-]
Out of curiosity, how was it discovered? You would have to look for it to find this linear combination.
Aurornis 2 days ago [-]
Check the linked GitHub issue. They explain their process.
Scroll past the first issue to find it. It’s further down.
jdiff 2 days ago [-]
Without the system prompt, asking its name results in it responding with the name of the model they're ripping from. That would certainly draw your eyes to the right places.
valleyer 2 days ago [-]
Why is this? Do labs reinforce the model name during training? I was under the impression that this sort of "self-knowledge" always came from the system prompt, but I guess not...
jdiff 2 days ago [-]
Yes. In this case, during fine tuning. Other blurbs are also baked in during fine tuning that are perfectly reproducible from the Nex model. The details inside the linked issue are quite accessible.
s1artibartfast 2 days ago [-]
How do you feel about the government or government contractors saying they did a bunch of work when they did nothing instead?
carlosjobim 2 days ago [-]
This is a pure scam on tax payer money. But what else would be expected?
hootz 2 days ago [-]
Apparently no public money was involved.
jdiff 2 days ago [-]
This is contrary to the mayor's words on Twitter.
> An open AI model trained in Rio with public funding over the last year by @Prefeitura_Rio surpassing all other models.
Unlike the big companies who do this, which often are merely impure scams on tax payer money a little more downstream.
philipallstar 2 days ago [-]
Companies that generate loads of corporation tax, income tax, and VAT revenue are the exact opposite of wastes of public money.
jrm4 2 days ago [-]
Yes, when they do so proportional to what they take, especially as compared to individuals and their tax liabilities.
You'll have to let me know when that finally happens, because that ain't now.
philipallstar 2 days ago [-]
Sorry, I've no idea how to read your first sentence.
Your second one - that's how everything public is paid for. Private individuals pay tax, either through their corporations paying corporation tax or the tax bill on top of their wage bills, which a) drives up prices of the goods and services they offer, or depresses wages, and b) funds all the public sector employees and orgs that don't pay tax (orgs) or don't pay net tax (employees).
jrm4 24 hours ago [-]
Not surprising.
The point of my first sentence is; private individuals and small businesses generally pay their fair share. Larger corporations emphatically do not.
philipallstar 22 hours ago [-]
What's not surprising?.
Larger corporations pay loads of tax. Shed loads. They pay all the employee and income tax, as well as corporation tax and their sales generate VAT. Small businesses are the ones most likely to have softer tax burdens due to progressive taxation.
jrm4 3 hours ago [-]
Oh, so you are either deliberately, or ignorantly, spreading utter falsehoods. I think we're done here.
philipallstar 3 hours ago [-]
You're saying "spreading" as though I'm going around saying things to people behind your back. I'm making clear, falsifiable statements, which you've chosen to not falsify. Yours is the poor response, not mine.
carlosjobim 2 days ago [-]
Great, now we're defending embezzlement and fraud with public funds on HN, because we really really hate big business.
A child caught doing something bad will cry "but my friends also did it!", is that the level of reasoning hackers want to be at?
blanched 2 days ago [-]
That seems like a bad faith read to me. Nobody is defending it, just pointing out the irony / hypocrisy. Two things can be bad, and they can be related.
carlosjobim 2 days ago [-]
You'd be surprised to hear then that I'm not the owner of any big company which embezzles tax payer money, and have never been involved in such.
blanched 2 days ago [-]
I don’t follow how that makes sense as a response to what I said?
carlosjobim 2 days ago [-]
Why would I be a hypocrite for pointing out public fund embezzlement?
blanched 2 days ago [-]
You’re not. The originally mentioned “big companies” are.
sdevonoes 2 days ago [-]
There are no hackers around here anymore. HN is mainly about business nowadays
dmix 2 days ago [-]
HN has always discussed business
jrm4 2 days ago [-]
What part of that said "defense?"
They can both be bad.
lostlogin 2 days ago [-]
> Great, now we're defending embezzlement
I might be missing something, but I don’t see anyone defending the the scams.
internet2000 2 days ago [-]
Attribution isn't the relevant part. Lying about your lab's capabilities is.
Planktonne 2 days ago [-]
That's also something all the AI companies have been doing.
dofm 2 days ago [-]
Lying about model capability is right now the lingua franca of the cloud AI business model, almost; they yes-and each other's lies because they are in a position of needing to generate interest, including going as far as needing to trigger regulatory capture.
(It's not news to anyone who has worked in sales-led businesses that salespeople are prone to believing the claims of other salespeople, I guess).
selcuka 2 days ago [-]
> Lying about model capability is right now the lingua franca of the cloud AI business model
Lying about your lab's capabilities != Lying about model capability
Exaggerating the capabilities of a new model that you've actually trained in press bulletins can be called marketing. Merging two models and claiming that you trained a new model is plain lazy.
2 days ago [-]
low_tech_love 2 days ago [-]
They’re using public money to “train” this.
vips7L 2 days ago [-]
Sounds like the whole AI movement.
themafia 2 days ago [-]
It seems to me like the lies are both for the same reason. To capture attention and profits that are not deserved.
functionmouse 2 days ago [-]
leopards ate my face
outside2344 2 days ago [-]
But the whole game is lying and stealing isn't it?
adrian_b 2 days ago [-]
I do not see anyone lying.
The model card says:
> Post-trained from Qwen 3.5 397B
The model card also says that they use an inference framework based on "SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs" by Shi et al.:
They only claim that what they did to "Qwen 3.5 397B" has improved the LLM, including, as expected, with "strong performance in Portuguese".
petu 2 days ago [-]
That's attribution to Qwen team.
There (is/was) no attribution to Nex team (they've released a model based on Qwen 3.5 397B as well).
As per OP link Nex claims that what Rio team released (so far) is just linear interpolation of weights between Nex and OG Qwen model. With no attribution to Nex and zero signs of Rio doing any training of their own.
2 days ago [-]
00index 2 days ago [-]
Are you talking about the credit that was just updated an hour ago? lol
2 days ago [-]
clear-octopus 2 days ago [-]
[dead]
2 days ago [-]
bachmeier 2 days ago [-]
"Their work"? First you had the original content creators that did 99.99% of the work. Then you had the US companies bundle it up into a frontier LLM. Then "they" did the "work" of using the US model as a foundation for their own. So in the sense of doing 0.00001% of the actual work that went into their product, sure.
I'd say it's more like someone forking a Linux distro, adding a few themes and fonts, and then complaining when someone else forks their distro and adds another theme.
dghlsakjg 2 days ago [-]
That’s the joke.
bachmeier 2 days ago [-]
It isn't. The entirety of the comment I responded to is "Oh no, someone is profiting off of their work without proper attribution!?!?" It's a valid point, but references someone using content created by others for profit. I'm objecting to equating this project with the work done by the original content creators. They're not remotely the same thing.
I understand how the internet works and how people respond to others in this type of setting, but the comment I replied to did not in any way make the point I was making about the disproportionate nature of relative contributions.
idiotsecant 2 days ago [-]
It's time to stop digging
dghlsakjg 2 days ago [-]
> It isn’t
It is.
> I understand how the internet works and how people respond to others in this type of setting, but the comment I replied to did not in any way make the point I was making about the disproportionate nature of relative contributions.
Do you understand?
Jokes aren’t that funny when you have to dig into an explanation on the nuance of why the hidden meaning doesn’t match the surface meaning in exact degree and proportions. That turns a joke into a pedantic comment. And paradoxically muddies the point by explaining it.
We aren’t morons. We understand that Picasso is doing something on a different level than someone feeding bulk scraped JPGs of paintings into a python script. You really don’t have to explain.
bachmeier 1 days ago [-]
Have a nice day.
vasco 1 days ago [-]
> I understand how the internet works and how people respond to others in this type of setting,
You should frame this as a reminder to be more charitable in your positions because sometimes you can be wrong. This subthread ended being one of the funniest I've read recently.
bwilliams18 2 days ago [-]
That was the joke of the parent comment.
JoshStrobl 2 days ago [-]
That joke really went over your head, huh...
harikb 2 days ago [-]
It is only a problem if you claim it to be an independently developed OS with no attribution to base
idiotsecant 2 days ago [-]
Oof this is delete your post level I think. Sorry bud, I been there.
>The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.
Incidentally are people using Github issues as blogs now?
jonchurch_ 2 days ago [-]
Edit: I didnt even notice until someone pointed out this was on the Nex-n2 repo not the rio one, now I understand the OP’s confusion!
It wasnt framed as an issue which is the norm breakage I think you’re reacting to, as in they didnt ask that the readme be updated etc, but it is common now for folks to use a project’s issue tracker to name and shame them in a place they cant easily ignore.
Whether that’s right, prosocial, or professional is up for debate (as well as if any single definition of etiquette can be expected in 2026 on an issue tracker).
But surely you can see the optics reason why someone would take their complaint to the repo directly? It pressures the maintainers to respond, it allows for a pile on from the internet, and makes any decision to lock down a hostile thread into its own kind of statement.
The maintainers should absolutely post an official response and lock the thread though, it will likely get ugly in there.
ChoosesBarbecue 2 days ago [-]
But this is posted on Nex's GitHub, not on "Rio de Janeiro's" GitHub.
i.e. this is the maintainer posting on their own GitHub Issues.
jordz 2 days ago [-]
Can someone please explain or link to some information about how models are merged? Is this genuinely merging weights mathematically or some kind of distillation (presumably not if they’ve done zero training as the post suggests).
But yes, in general, merging refers to techniques that directly blend the weights of different models mathematically. It had a big moment of popularity ~2 years ago, with many so-called "Frankenmodels" popping up on leaderboards.
I tend to think of merging as belonging to the same general umbrella as things like "abliteration", or other techniques that surgically modify the weights of a model without a traditional training/tuning loop. Maxime Labonne is a great person to follow if you're interested in this general area.
jxmorris12 1 days ago [-]
There’s nothing to read.
Model A: A_1, …, A_n
Model B: B_1, …, B_n
C_i = A_i * p + B_i * (1 - p)
In other words, it’s just a linear combination of the other models’ weights, per position.
joe_the_user 1 days ago [-]
It's been a while since I looked at neural networks in detail. Do all the large models have a close enough architecture that this makes sense? Do they have the same number of layers and width? I had thought that each model it's own "secret sauce" of normal and special layers (convolution, max-pooling, something-something) stacked together. Genuinely curious.
fkozlowski 2 days ago [-]
I'm honestly surprised that they even had the inclination to attempt creating a model. I guess it's bullish that a municipal IT department had the guts to try this?
Havoc 2 days ago [-]
Merges and fine tunes are within reach of individuals with some money to burn so I’m sure a muni can do it
axus 2 days ago [-]
I like the [dead] comment theory that they proposed a huge LLM training budget to the government, kept most of the money, and released a cheap merge to justify the grift.
dormento 2 days ago [-]
This would be so very brazilian of them.
Source: am Huelander.
seba_dos1 2 days ago [-]
It's kinda weird to claim extraordinary results in such case though, as that brings a lot of eyes to it.
mgambati 2 days ago [-]
Nothing weird. The mayor wanted something brag about. That Rio, my friend.
fkozlowski 2 days ago [-]
Ah that makes sense
matheusmoreira 2 days ago [-]
That's essentially Brazil's standard operating procedure. Wouldn't be surprising if that turned out to be the case.
Still, I'm actually impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is the last headline I expected to read on HN.
aaronbrethorst 2 days ago [-]
They really missed out by not calling it Neuromancer.
2 days ago [-]
jrm4 2 days ago [-]
“Well, Steve (Jobs), I think it’s more like we both had this rich neighbor named Xerox, and I broke into his house to steal the TV set, but I found out that you had already stolen it.”
-- Bill Gates
ckcheng 2 days ago [-]
What’s more funny to me is the set up to that quote:
> Bill Gates had somehow manifested, alone, surrounded by ten Apple employees. … Steve started yelling at Bill, asking him why he violated their agreement.
And what’s more interesting is the conclusion:
> Apple filed a monumental copyright lawsuit against Microsoft in 1988, but they eventually lost on a technicality (the judge ruled that Apple inadvertently gave Microsoft a perpetual license to the Mac user interface in November 1985).
Microsoft didn’t steal Apple’s GUI … Apple gave it to them.
alexgoodhart 2 days ago [-]
That isn’t fully true is it?
Microsoft claimed that its software’s use of various visualizations related to window state was covered by the 1985 agreement, and Apple claimed that this was not true; those window states were produced by Macintosh while Microsoft’s software was being rendered in the Mac environment.
> In his March 20, 1989 Order, Judge Schwarzer declined to consider whether the visual displays in issue were generated by the Microsoft application programs or by the Macintosh system software. The point arose in connection with Microsoft's argument that the 1985 Agreement licensed to Microsoft all visual displays that could possibly be called up by running the five Microsoft application programs on the Macintosh system software then or in the future. 709 F. Supp. at 929. Judge Schwarzer concluded that Microsoft's contention would "defy common sense." Id.
themafia 2 days ago [-]
Two spoiled rich kids arguing over who's morality is the least worst.
That this moment is held up as some great exchange in business is annoying. That our regulatory agencies are perennially sleep at the switch and allow this nonsense to keep happening is extremely frustrating.
ChrisClark 2 days ago [-]
Held up as some great exchange? No it's two assholes arguing with each other. Just like most Jobs documentaries show him as a terrible person.
Scroll_Swe 2 days ago [-]
[flagged]
themafia 2 days ago [-]
Let me guess, when confronted with uncomfortable information that requires you to think longer than you are used to, you devolve to false dichotomies into defend your ego?
Scroll_Swe 54 minutes ago [-]
Answer the question. My ego? I am not Steve Jobs or Gates.
I live in Sweden but I worry about my country due to online freaks like you. Fair?
wunderlotus 2 days ago [-]
lmao i really hope this is a real quote cuz it’s a banger
One funny thing about incompetence is that they don't have the competence to know that their incompetence is straightforward to verify by a competent person.
root-parent 2 days ago [-]
You just described every single vibe coder...
vvpan 2 days ago [-]
I think that's unfair to "vibe coding". If anybody explicitly claims to vibe coding something than they are admitting to low supervision of the code. And on the contrary you can also AI-produce code that you have supervised highly. I suppose there are people who both AI their code and push it as bespoke but I, for one, have not met such a person at our outside of work.
root-parent 2 days ago [-]
>> but I, for one, have not met such a person at our outside of work.
Why would they care? They get their salaries and pensions and bonuses, and the tax payer is footing the bill.
thimabi 2 days ago [-]
I wouldn’t describe what happened here as incompetence. As a “carioca”, I am pleasantly surprised to know that the government’s IT department is involved in AI work — even without the budget to create its own models from scratch.
antonvs 1 days ago [-]
They could do AI work without trying to lie to the entire rest of the world.
arcticfox 2 days ago [-]
This seems kind of insane though, every time I go to Rio I think of the potential of AI/technology to solve some problems and leave it even more paradisiacal... But working on their own model? Wtf? There are a million applications of existing ones there that should be followed up on instead.
reese_john 2 days ago [-]
It is a testament to the bloat and overreach of the Brazilian state in the economy. Such endeavors should be left to the private sector
thimabi 2 days ago [-]
I disagree. I’d prefer if my government invested more in AI solutions, so as not to depend so much on foreign technology.
In an ideal world, Brazil would have a thriving private sector, capable of competing even in the AI sector. Unfortunately, that’s not the case, and I believe that without government action such endeavors won’t really succeed.
2 days ago [-]
2 days ago [-]
jkwang 1 days ago [-]
This is a concerning pattern. Rebranding merged models as "homegrown" without disclosure undermines trust in open-source AI development. The community needs better provenance tracking and transparency standards for model releases.
thelonelyborg 2 days ago [-]
this is probably occurring all over the world including in startups.
RandyOrion 1 days ago [-]
Please do not claim you trained a new model, only to got caught red-handed by others. There are already several people or groups did that, got caught, and vanished in no time.
Check how the "authors" of "this model" react to this problem [1]. See how they deal with this problem by first changing their affiliation from https://iplanrio.rio.rj.gov.br to https://iplanrio.prefeitura.rio [2], then saying that they are sorry for being caught [3], then just remove all their affiliations once for all [4].
I think the "authors" of "this model" [5] should be held accountable until they upload new checkpoints, and the performance of the new model is verified by third-parties.
P.S. To people who downvoted me, show me why you're doing this.
Its stupid and hilarious when someone in Rio does it; when a techbro in silicon valley does it they get VC funding, a maserati and an entry on the 30 under 30 list.
rgbrth 1 days ago [-]
I don't think people are saying it's stupid. It's just funny that potentially some random municipality worker is going well beyond their work scope and making contributions in the AI world.
Could be from Rio, could be from any municipality anywhere in the world. The fact that the account is actually from the town hall rahter than a personal account also makes it funnier.
rsynnott 1 days ago [-]
> and an entry on the 30 under 30 list.
Ah, yes, the Nobel Prize for Fraud.
(I'm seriously kind of amazed they're still publishing those.)
nicman23 1 days ago [-]
is it any good?
AnotherGoodName 2 days ago [-]
This is fascinating that it worked though. Can we just merge all the open weight models and get something better?
wds 2 days ago [-]
I imagine it'd work the same as merging all the good-tasting foods to get an even tastier one
booleandilemma 2 days ago [-]
[dead]
_3u10 2 days ago [-]
No, they need the same arch, but you can distill them into a single model. And yes, if you use the API directly Claude will often say it’s an open weight model (likely the ones it was distilled from)
nylonstrung 2 days ago [-]
If you go to Civitai this is pretty how it works in that corner of the image generation world
Everything is using Stable Diffusion as underlying model, then most of the usage is merged of checkpoints
avereveard 2 days ago [-]
most merge improve a small subset of "feeling" benchmark (too small, too specific, or out of distribution) and tend to show degradation on actual benchmark, with especially punishing result on long chain benchmarks.
also only work on matching architectures (i.e. finetunes/loras of the same model)
vor_ 2 days ago [-]
Merging related models has been a very common practice for years. See the Stable Diffusion community.
dindunuf 2 days ago [-]
that kinda worked in llama 1/2 era, not between different models but between finetunes of the same model. the briefly legendary Mythomax was IIRC a merge of 5+ tunes, some of which were merges themselves.
yieldcrv 2 days ago [-]
Didn’t the last thread about this have someone from the lab or an enthusiast in Rio saying exactly that?
Its a fine tune of Qwen
Not a conspiracy
daemonologist 2 days ago [-]
The allegation here is that it's not actually a fine-tune of Qwen, but instead an undisclosed mashup (merge) of someone else's fine-tune of Qwen and the original model. Rio subsequently said that the model was in fact a merge, that they did additional fine-tuning after the merge, and that they accidentally uploaded the base merge instead of the version with additional fine-tuning. But this seems like quite an oversight...
yieldcrv 2 days ago [-]
> But this seems like quite an oversight...
Not to me, what would people like to happen? Who are those people? And why do they care?
antonvs 1 days ago [-]
They made a public claim to having produced a useful model, which they published. Turns out they did nothing of the sort.
> why do they care?
Why does anyone ever care about having their time wasted by fraudulent claims?
yieldcrv 1 days ago [-]
Continue to explain like I’m 5 instead of the rhetoricals
FooBarWidget 1 days ago [-]
Can anyone explain to me what a merge is and why that works? It seems utterly bizarre to me that you can just merge weights. You can't make a working program by just merging machine instruction pages. Aren't weights tightly coupled to a specific architecture?
antonvs 1 days ago [-]
In this case both sets of weights ultimately came from the same model. The Nex model they used is a fine-time of Qwen, which was the other model they used.
I'm not an expert in this area, but it's not too hard to see how a merge like that could turn out ok.
delusional 2 days ago [-]
It's absolutely insane to me that we are now at a point where the top of the front page of hacker news is a random GitHub issue about attribution to some random LLM merge, written in just the most disgusting AI slop style.
I would like to downvote this please.
vor_ 2 days ago [-]
There's been a noticeable drop in quality. It's often a blend of AI culture war posts and arbitrary Github links.
PixComicOS 2 days ago [-]
[flagged]
Aurornis 2 days ago [-]
[dead]
hottrends 2 days ago [-]
[flagged]
flowbarai 2 days ago [-]
[flagged]
jing09928 2 days ago [-]
[flagged]
antii 2 days ago [-]
[dead]
diego_moita 2 days ago [-]
WHAT!? There are thieves in Rio de Janeiro?
Oh, I am so SHOCKED, so SHOCKED! /s
Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de bandido" (Gangster's Land).
Kinda like Chicago in the 20's or Naples and Palermo in the 90s.
Scroll_Swe 2 days ago [-]
[flagged]
vvpan 2 days ago [-]
Gross historical ovesimplifications aside I am wondering why one would use this as an opportunity to belittle a whole continent.
Scroll_Swe 1 days ago [-]
Some continents are just better, simple as.
antonvs 1 days ago [-]
Have you ever heard of literature?
Scroll_Swe 1 days ago [-]
Yes
elzbardico 2 days ago [-]
[flagged]
guiraldelli 2 days ago [-]
Without evidence, your comment is just bad mouthing.
I have been involved in academia, including in Brazil, and I don't find academia there any more copycat than any other institution, including top tier ones.
boca_honey 2 days ago [-]
This is very easy to prove [1][2]. Brazil has that reputation in the broarder academic world, and it's for a reason.
One study about faculty hiring people they know, and the other about high school students cheating on assignments...
What was the original claim again?
dghlsakjg 2 days ago [-]
This was a municipality working with a government associated IT company.
What does it have to do with Brazilian academia?
_3u10 2 days ago [-]
No, typically Brazilians go to Paraguay for their education, most of their technology comes from Paraguay too.
matheusmoreira 2 days ago [-]
No. We go to Paraguay to buy cheaper electronics.
knuppar 2 days ago [-]
muamba garai
dghlsakjg 2 days ago [-]
There’s more than 6x more Brazilian degree holders than there are Paraguayans in total.
That’s a pretty impressive accomplishment.
If true.
cassiogo 2 days ago [-]
What? Never heard of this
stymaar 2 days ago [-]
That sounds like nonsense, they don't even speak the same language in Brasil and Paraguay …
knuppar 2 days ago [-]
that's just a lie lol, stop spreading misinformation
pelasaco 2 days ago [-]
an eternal 7x1.. and I am not talking about Curaçao..
MadrasTh0rn 2 days ago [-]
Not surprised
nom 2 days ago [-]
why not?
diego_moita 2 days ago [-]
It is a recurrent Brazilian meme: Rio is known in Brazil as "terra de bandido" (gangster's land).
The majority of their politicians have ties to organized crime. There is a virtual revolving door between police and crime, where people migrate from one to the other.
It is like Chicago in the 20s, Naples and Medelin in the 80s or Moscow and Culiacan (Sinaloa, Mexico) today.
dormento 2 days ago [-]
Rio is kinda funny as a litmus test - federal government creates laws to try and curb some of the corruption, and Rio produces better and better corrupts - so far Rio is winning.
BTW wasn't it a few months ago the current governor wanted to leave to be able to run as a candidate, so he asked a supreme justice to step in in as governor, since there wasn't anyone else that technically could?
brunoarueira 2 days ago [-]
No, he left to be a Senate candidate and their vice governor left in 2025 to another role, then the next in line is the Legislative Assembly of the State of Rio de Janeiro president, but him was jailed and away from the role. So the next is a judge from the Justice Tribunal.
alexgoodhart 2 days ago [-]
Somehow I doubt that political affiliations with crime syndicates are affecting heavily the dispositions of LLM developers. The industry itself though is one of incest.
sebastianconcpt 2 days ago [-]
Politicians don't come from outer space, they emerge locally and were raised swimming in an imaginary that has normalized the morals that eventually end up expressed at the top.
afh1 2 days ago [-]
He is putting into question the character of the public workers involved in the project, not that it has anything to do with organized crime. Rio has relapsed into crime in the last decades and government workers in general have a reputation for corruption in Brazil. It's a low trust society specially north of Parana hence the lack of surprise.
alfiedotwtf 2 days ago [-]
Wasn’t it already obvious given the awfully familiar parameter numbers?
intoXbox 2 days ago [-]
That only tells what base architecture they used, but fine tuning does not increase the number of weights, it just adapts the weights to improve better on a fine tuning dataset- something they claimed they had done
Havoc 2 days ago [-]
Nex in turn is also based on qwen so don’t think they’re too far off
1. They claim the official model is based on Qwen 397B. It's likely they didn't disclose Nex Pro at all because Nex itself is based on the same base model (not saying they shouldn't).
2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.
3. It's important to notice they didn't advertise the model besides posting it on Reddit 2 days ago. It became viral organically, over the weekend, and during Brazil's World Cup debut (Brazilians will understand). Of course the mayor of Rio took the opportunity to capitalize over the free coverage, but that wasn't done in conjunction with the researchers.
4. I don't see why they would disclose Qwen 397B as base and mention the SwiReasoning paper but not mention Nex if all they did was to merge both models.
5. In any case, what they are claiming is easily verifiable once (if) they upload the right model.
https://news.ycombinator.com/item?id=48529544
Anyways SwiTransformer paper looks interesting and doing a post training to optimize for it looks interesting as well.
Rio has a strong engineering talent pool, along with many other major capitals in Brazil
What Brazil doesn't have is a history of properly rewarding talent, which often causes it to migrate elsewhere. So it's definitely surprising when any sort of technological development happens in Brazil: it implies someone who stayed managed to get something done, most likely for much less than what that something is actually worth, while also being crushed by extremely high taxes that essentially doubles the cost of computer hardware.
I think people are missing the last few words -- cost of computing hardware
when I used to do ISP work I did a lot for LATAM. The joke was that you'd get better bandwidth for Brazil routing out of the country and through Miami than going across the country. The reason? crazy high tariffs on hardware.
No reason to base anything locally, and if you're not basing it locally then there isn't really much reason to stick around, either. Go to other hot markets like Zona America, Austin, CDMX, Miami, Los Angeles, etc. and make the big $$$.
I worked with 2 Brazilian engineers who were in country (and currently work with a 3rd now, based in Monteal) and they were very good but all said they had to get out of country to lock in the serious engineering roles.
I always find this funny. Brazilian taxes are nowhere near what I would say “high”. I pay about twice as much out of my compensation as I would pay in Brazil, and that would be as if I did zero tax optimisation back then.
Compared to many countries Brazil doesn't have such high taxes (I'd say that if you work remotely for a company outside of Brazil, you'll probably have much lower taxes compared to almost any other country -- working locally the difference isn't as big, but you have higher taxes in many other places).
What it really lacks is access to capital (which is the real "mojo" of the US compared to the rest of the world).
Incorporating and getting a functional business entity in Brazil is harder. In USA I literally do in 5min online including bank account. In Brazil they are taking out microscopes to verify your signature on the paperwork matches.
And in the USA if you have one bad employee, just fire them any time. In Brazil for better or for worse nowhere near as easy. Obviously better for employees but businesses don’t like it because you can get stuck with a employee dragging down everyone unless you pay them a years salary etc.
As a business owner: not so bad if you are a freelancing or just a few business partners providing some type of service, but terrible the moment you start considering employing other people.
Have you seen the public services of countries with lower taxes? Their public hospitals?
> but terrible the moment you start considering employing other people.
Employing people isn't cheap anywhere (except, perhaps, in the US, where labour rights are kind of nonexistent)
I quick visit to the dermatologist to check for some tiny bumps that showed up in my forehead: 60€, out of pocket, because the insurer doesn't cover it.
All in all, my point was only that the amount of taxes that people pay and quality of services are not necessarily related. Germany has high taxes and expensive-but-adequate healthcare. Greece has high taxes and expensive-and-inadequate healthcare. Switzerland has low taxes and universal/cheap healthcare (max. $5000/year deductible, max charge per hospitalization of $700).
That's how public health works. It's the same as mortgage insurance in Brazil (where I come from), which is mandatory, and, since it's mandatory, it doesn't consider actuarial risk.
The result is a nearly 100% tax on computers and consumer electronics.
One for you, one for the government.
And it's getting worse. Tariffs on computer hardware were raised only a few months ago.
The tariffs for commercial importations are much lower and depend on the part. For SSDs, for example, II is around 10%. With other fees and ICMS, you're looking at around +60% total. Still high, but not nearly as high.
But large businesses would rather really prefer if you continued to believe they pay +88% just like you. That way they get to point at the government while keeping their fat margins.
And, in the meantime, they help push for more "grift-friendly" politicians. For them, it's a win-win situation.
Apart from that, this is something that affects the HN crowd and almost nobody else.
The result is a nearly 100% tax on computers and consumer electronics. One for you, one for the government.
That 6% figure is just the Simples Nacional rate for micro-businesses making less than 35kUSD/year. The actual income tax tops out at 27.5% at middle class thresholds. On top of that Brazil stacks social security tax, payroll taxes and a yet more taxes embedded in every single purchase. If you calculate all of this you can figure out something like up to 70% of a brazilian's income can flow to the government.
You say swedish companies pay 70% taxes. Well, swedish citizens get excellent services and a generally functioning country in return. Brazilian citizens pay 70% taxes and they get... Brazil.
I'm not doing anything creative accounting-wise, I just max out my contributions to retirement accounts (PGBL) and get the correct tax deductions for all medical and education expenses.
We do have high import tariffs for individuals, and especially for consumer goods, as it's been pointed out in a different comment.
This does make it a very expensive country indeed if you want to live your life worshiping consumerism. But if you don't, you'll find that individuals don't really pay that much compared to other countries.
It's your comment that's misleading. I was trying to account for the numberless taxes that exist and get applied to every single transaction. You zeroed in on income taxes then stacked some deductions on top.
> tax deductions
Discounting deductions from the nominal tax rate doesn't change the fact those taxes are high, nor does it change the fact you max out your tax bracket at middle class incomes.
Deductions are actually the bare minimum. If you're using them, it means the state failed to provide you with proper education and health services, forcing you to spend money on things that are theoretically your constitutional rights. Not deducting these expenses would be robbery. The fact most brazilians have plenty of deductions at their disposal is only evidence of how absurdly tax inefficient this country is.
These deductions aren't automatic either, you have to spend time and effort accounting for all of this so that you can make the government give back some of the money it took from you. Time is money, so this is just yet another stealthy tax.
Finally, other countries no doubt have deductions too. I know for a fact that the US does, and european countries almost certainly do too. Accounting for these will probably only make Brazil look even worse by comparison.
> This does make it a very expensive country indeed if you want to live your life worshiping consumerism.
What a dismissive comment.
US government just banned Fable for foreign peasants like us. If you want a computer that can properly run LLMs locally, you're going to be forced to shell out money in the 40-100kBRL range. Computers are in the same price range as cars now.
If you think having some degree of sovereignty over our computing is "worshipping consumerism", then I don't know what to say to you.
Europe is currently fighting tooth and nail to develop some technological independence. China is creating Manhattan projects to catch up to the west in semiconductor manufacturing and kick them out of their supply chains. If we keep up these nonsense taxes, AI will be just yet another area where Brazil is half a century behind.
Brazil taxes foreign products in order to "protect local industry", then it taxes the local industry as well, which means pretty much nothing higher up in the value chain gets made here. Brazilian efforts at creating national computer technology date back to the military dictatorship, to the import substitution policies. The same time period that birthed Lua, in fact. What have we been doing since then? Nothing. Don't have our own industries, and we can't really buy the products produced by other nations either. This is why people leave: Brazil combines the worst of both worlds.
You're the one that brought up a comically inflated 70% number as if it were realistic. You can't act as if the nominal rate is the effective rate, then complain when I bring up numbers based on the effective rate.
> If you're using them, it means the state failed to provide you with proper education and health services, forcing you to spend money on things that are theoretically your constitutional rights.
No, it means I'm picky about my doctors. You seem to have ignored the tax-advantaged retirements accounts, though.
> These deductions aren't automatic either, you have to spend time and effort accounting for all of this so that you can make the government give back some of the money it took from you. Time is money, so this is just yet another stealthy tax.
You just need to ask for receipts and put them in a (digital) folder. Then you spend 5 minutes tops _per *year*_ reporting their sums on your tax forms. If that's not enough, most of the numbers are pre-filled for you, you just have to review it. And you can download past receipts from the federal government's website.
> I know for a fact that the US does, and european countries almost certainly do too. Accounting for these will probably only make Brazil look even worse by comparison.
Then do it. Tax legislation is very different across countries and even municipalities. Comparing nominal tax rates is completely meaningless. You need to compare the effective tax rate.
> If you want a computer that can properly run LLMs locally, you're going to be forced to shell out money in the 40-100kBRL range. Computers are in the same price range as cars now.
What part of that is due to an increase in taxes? Hardware prices have skyrocketed around the world due to limited supply. In fact, there's a record high number of computer hardware parts in the most recent list of products exempt of import taxes.
> If we keep up these nonsense taxes, AI will be just yet another area where Brazil is half a century behind.
Our government is doing exactly that. The latest project in discussion in the Senate will give import tax exemptions and export tax exemptions to data center projects that reserve 10% capacity to the national market, invest 2% locally in R&D, and use clean energy. I think these numbers are ridiculously small.
If we had lower import taxes on data center hardware, how else would the government negotiate with data center companies to reserve capacity for our national interests?
Finally, I think it's a bit silly to think that _you and me_ running agentic coding LLMs at home furthers national interests. It does not. It furthers our hobbies. It's not even the kind of hobby that gives you relevant career experience which then goes on to strengthen our industry.
> The same time period that birthed Lua, in fact.
Lua was created in 1993 in a lab doing research for Petrobrás. I happened to graduate from PUC-Rio, so I know this personally: the Computer Science labs are receiving much more funding nowadays than they did in 1993. They're still cranking out excellent research, and, if I may say so myself, excellent alumni as well.
> What have we been doing since then? Nothing.
- Our electronic voting system; - Pix, the largest and most popular payment network in the world; - Elixir, LangFlow, Neovim, just to name a few that you probably know about.
They merged the base model with another lab’s fine tuned model. The improvements could have come from getting some of the fine tuned weights from the other model.
If they really had a better performing model that they “accidentally” forgot to upload, they could have uploaded the correct file by now.
https://news.ycombinator.com/item?id=48529544
I am willing to give them the benefit of the doubt, but we've seen this before: a model gets released that is supposedly state-of-the-art, yet seems to be a an other repackaged model without any training. Reflection 70B was the most similar example, all they now need is an api that rewrites "Claude" to "Rio".
That's what makes this hilariously sad. Brazil could have done some good work here, but it just didn't. Brazil merged two models on a workstation.
https://x.com/tenobrus/status/2066243352211996728/photo/1
I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.
Enhanced it on a couple benchmarks, supposedly.
The game is to turn knobs until you get a benchmark run that shows an improvement, then ship it. There are a lot of fine tunes and chimera models on HuggingFace that are supposedly better at some specific test, but when you use them for anything else they're usually worse.
This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases.
https://web.archive.org/web/20260614082641/https://huggingfa...
And the Nex benchmarks for comparison
https://huggingface.co/nex-agi/Nex-N2-Pro
Rio seems to be about halfway between Qwen 3.5 and Nex, as you'd expect?
Many of the “uncensored” model providers also do some fine tuning on the models. Some of them target better benchmarks or other measures, but outside of the benchmarks and metrics they’re fine tuned for they are generally noticeably worse than the original model.
I guess I’m looking for a kind of bulk/sticky dropout (which was in fashion way back when I studied DNN in school).
Abliteration whilst a neologism implies a surgical ablation of refusal.
Earlier approaches post–trained the model to refuse less and, much like other kinds of fine–tuning, it degraded performance. They were "uncensored".
Abliteration has seen some improvement to this day but it always was close to equivalent performance to the original when compared to those earlier techniques.
They're more prone to getting stuck in loops, becoming unresponsive, and hallucinating more (presumably because of the reduced desire to not answer).
I've tried all the popular heretic peddlers, but if you have one that you can vouch for maybe I've simply missed it.
i.e reinforcement learning against a weak reward function - benchmark is insufficiently complex and is not representative of the real world sufficiently.
The "game", i.e. decision tree can be modeled as a multi-arm bandit problem, to deploy finite resources ( compute) toward exploitation/exploration .
The main issue is each training / fine-tune is very expensive so number of chances at the slot so to speak is pretty limited today.
I don't believe this would work on two LLMs that have different pretraining. Even if it did you would need two LLMs that have exact same internal activation shapes, dimensions, expert counts, token vocabulary, realistically it would never happen outside of finetunes or academic experiments.
[1]: https://arxiv.org/abs/2203.05482
It is not understood why it works so well.
Which could be a signal that your "performance" was so abysmal in the first place that even randomly applied training methods can't make it _worse_.
Then researchers looked at the weights and there is no post training at all.
They are now attributing both models they merged, but their excuse for the lack of post training is to claim they accidentally uploaded the wrong files.
Look up "Reflection 70B" drama.
The dispute is that they released it with claims about having done some post training that improved the outputs. It was discovered that the model was not post trained like they claimed.
The HF page now says it’s a merge of models, which wasn’t there before. They’re trying to claim they accidentally uploaded the wrong model to HF and that they’ll upload the real one soon.
Basically, they thought they could splice two open weights models together and claim their team had accomplished some amazing post training, but they weren’t smart enough to realize that other researchers would discover that there wasn’t any post training.
But it's impossible to form a nuanced opinion when political association has a higher priority than the facts; which, again, don't look flattering for the implementers.
In the early days of Llama there were a lot of experiments like this. There were even some interesting combinations of models where they stacked layers of different models together or even added more layers with interesting results.
But announcing that you spliced two models together isn't very impressive in 2026, so they announced that they had done their own post training and outdid the big labs. They thought nobody would look close enough to notice.
Scroll past the first issue to find it. It’s further down.
> An open AI model trained in Rio with public funding over the last year by @Prefeitura_Rio surpassing all other models.
https://x.com/CavaliereRio/status/2065984620626129026
You'll have to let me know when that finally happens, because that ain't now.
Your second one - that's how everything public is paid for. Private individuals pay tax, either through their corporations paying corporation tax or the tax bill on top of their wage bills, which a) drives up prices of the goods and services they offer, or depresses wages, and b) funds all the public sector employees and orgs that don't pay tax (orgs) or don't pay net tax (employees).
The point of my first sentence is; private individuals and small businesses generally pay their fair share. Larger corporations emphatically do not.
Larger corporations pay loads of tax. Shed loads. They pay all the employee and income tax, as well as corporation tax and their sales generate VAT. Small businesses are the ones most likely to have softer tax burdens due to progressive taxation.
A child caught doing something bad will cry "but my friends also did it!", is that the level of reasoning hackers want to be at?
They can both be bad.
I might be missing something, but I don’t see anyone defending the the scams.
(It's not news to anyone who has worked in sales-led businesses that salespeople are prone to believing the claims of other salespeople, I guess).
Lying about your lab's capabilities != Lying about model capability
Exaggerating the capabilities of a new model that you've actually trained in press bulletins can be called marketing. Merging two models and claiming that you trained a new model is plain lazy.
The model card says:
> Post-trained from Qwen 3.5 397B
The model card also says that they use an inference framework based on "SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs" by Shi et al.:
https://arxiv.org/abs/2510.05069
So the sources seem properly attributed.
They only claim that what they did to "Qwen 3.5 397B" has improved the LLM, including, as expected, with "strong performance in Portuguese".
There (is/was) no attribution to Nex team (they've released a model based on Qwen 3.5 397B as well).
As per OP link Nex claims that what Rio team released (so far) is just linear interpolation of weights between Nex and OG Qwen model. With no attribution to Nex and zero signs of Rio doing any training of their own.
I'd say it's more like someone forking a Linux distro, adding a few themes and fonts, and then complaining when someone else forks their distro and adds another theme.
I understand how the internet works and how people respond to others in this type of setting, but the comment I replied to did not in any way make the point I was making about the disproportionate nature of relative contributions.
It is.
> I understand how the internet works and how people respond to others in this type of setting, but the comment I replied to did not in any way make the point I was making about the disproportionate nature of relative contributions.
Do you understand?
Jokes aren’t that funny when you have to dig into an explanation on the nuance of why the hidden meaning doesn’t match the surface meaning in exact degree and proportions. That turns a joke into a pedantic comment. And paradoxically muddies the point by explaining it.
We aren’t morons. We understand that Picasso is doing something on a different level than someone feeding bulk scraped JPGs of paintings into a python script. You really don’t have to explain.
You should frame this as a reminder to be more charitable in your positions because sometimes you can be wrong. This subthread ended being one of the funniest I've read recently.
>The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.
Incidentally are people using Github issues as blogs now?
It wasnt framed as an issue which is the norm breakage I think you’re reacting to, as in they didnt ask that the readme be updated etc, but it is common now for folks to use a project’s issue tracker to name and shame them in a place they cant easily ignore.
Whether that’s right, prosocial, or professional is up for debate (as well as if any single definition of etiquette can be expected in 2026 on an issue tracker).
But surely you can see the optics reason why someone would take their complaint to the repo directly? It pressures the maintainers to respond, it allows for a pile on from the internet, and makes any decision to lock down a hostile thread into its own kind of statement.
The maintainers should absolutely post an official response and lock the thread though, it will likely get ugly in there.
i.e. this is the maintainer posting on their own GitHub Issues.
But yes, in general, merging refers to techniques that directly blend the weights of different models mathematically. It had a big moment of popularity ~2 years ago, with many so-called "Frankenmodels" popping up on leaderboards.
I tend to think of merging as belonging to the same general umbrella as things like "abliteration", or other techniques that surgically modify the weights of a model without a traditional training/tuning loop. Maxime Labonne is a great person to follow if you're interested in this general area.
Model A: A_1, …, A_n Model B: B_1, …, B_n
C_i = A_i * p + B_i * (1 - p)
In other words, it’s just a linear combination of the other models’ weights, per position.
Source: am Huelander.
Still, I'm actually impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is the last headline I expected to read on HN.
-- Bill Gates
> Bill Gates had somehow manifested, alone, surrounded by ten Apple employees. … Steve started yelling at Bill, asking him why he violated their agreement.
And what’s more interesting is the conclusion:
> Apple filed a monumental copyright lawsuit against Microsoft in 1988, but they eventually lost on a technicality (the judge ruled that Apple inadvertently gave Microsoft a perpetual license to the Mac user interface in November 1985).
Microsoft didn’t steal Apple’s GUI … Apple gave it to them.
Microsoft claimed that its software’s use of various visualizations related to window state was covered by the 1985 agreement, and Apple claimed that this was not true; those window states were produced by Macintosh while Microsoft’s software was being rendered in the Mac environment.
> In his March 20, 1989 Order, Judge Schwarzer declined to consider whether the visual displays in issue were generated by the Microsoft application programs or by the Macintosh system software. The point arose in connection with Microsoft's argument that the 1985 Agreement licensed to Microsoft all visual displays that could possibly be called up by running the five Microsoft application programs on the Macintosh system software then or in the future. 709 F. Supp. at 929. Judge Schwarzer concluded that Microsoft's contention would "defy common sense." Id.
That this moment is held up as some great exchange in business is annoying. That our regulatory agencies are perennially sleep at the switch and allow this nonsense to keep happening is extremely frustrating.
I live in Sweden but I worry about my country due to online freaks like you. Fair?
https://www.folklore.org/A_Rich_Neighbor_Named_Xerox.html
https://news.ycombinator.com/item?id=48516679
In an ideal world, Brazil would have a thriving private sector, capable of competing even in the AI sector. Unfortunately, that’s not the case, and I believe that without government action such endeavors won’t really succeed.
Check how the "authors" of "this model" react to this problem [1]. See how they deal with this problem by first changing their affiliation from https://iplanrio.rio.rj.gov.br to https://iplanrio.prefeitura.rio [2], then saying that they are sorry for being caught [3], then just remove all their affiliations once for all [4].
I think the "authors" of "this model" [5] should be held accountable until they upload new checkpoints, and the performance of the new model is verified by third-parties.
P.S. To people who downvoted me, show me why you're doing this.
[1] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...
[2] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...
[3] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...
[4] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...
[5] https://huggingface.co/prefeitura-rio
Could be from Rio, could be from any municipality anywhere in the world. The fact that the account is actually from the town hall rahter than a personal account also makes it funnier.
Ah, yes, the Nobel Prize for Fraud.
(I'm seriously kind of amazed they're still publishing those.)
Everything is using Stable Diffusion as underlying model, then most of the usage is merged of checkpoints
also only work on matching architectures (i.e. finetunes/loras of the same model)
Its a fine tune of Qwen
Not a conspiracy
Not to me, what would people like to happen? Who are those people? And why do they care?
> why do they care?
Why does anyone ever care about having their time wasted by fraudulent claims?
I'm not an expert in this area, but it's not too hard to see how a merge like that could turn out ok.
I would like to downvote this please.
Oh, I am so SHOCKED, so SHOCKED! /s
Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de bandido" (Gangster's Land).
Kinda like Chicago in the 20's or Naples and Palermo in the 90s.
I have been involved in academia, including in Brazil, and I don't find academia there any more copycat than any other institution, including top tier ones.
[1] https://www.sciencedirect.com/science/article/abs/pii/S17511...
[2] https://www.scielo.br/j/aac/a/xNytDrrrHdyK4XPcHBRJZmd/?lang=...
What does it have to do with Brazilian academia?
That’s a pretty impressive accomplishment.
If true.
The majority of their politicians have ties to organized crime. There is a virtual revolving door between police and crime, where people migrate from one to the other.
It is like Chicago in the 20s, Naples and Medelin in the 80s or Moscow and Culiacan (Sinaloa, Mexico) today.
BTW wasn't it a few months ago the current governor wanted to leave to be able to run as a candidate, so he asked a supreme justice to step in in as governor, since there wasn't anyone else that technically could?