Nearly a month ago, I decided to add Gemini support to Feeds Fun and did some research on top LLM frameworks — I didn't want to write my own bicycle.
As a result, I found an embarrassing bug (in my opinion, of course) in the integration with Gemini in LLamaIndex. Judging by the code, it is also present in Haystack and in the plugin for LangChain. And the root of the problem is in the Google SDK for Python.
When initializing a new client for Gemini, the framework code overwrites/replaces API keys in all clients created before. Because the API key, by default, is stored in a singleton.
It is death-like, if you have a multi-tenant application, and unnoticeable in all other cases. Multi-tenant means that your application works with multiple users.
For example, in my case, in Feeds Fun, a user can enter their API key to improve the quality of the service. Imagine what a funny situation could happen: a user entered an API key to process their news but spent tokens (paid for) for all service users.
I reported this bug only in LLamaIndex as a security issue, and there has been no reaction for 3 weeks. I'm too lazy to reproduce and report for Haystack and LangChain. So this is your chance to report a bug to a top repository. All the info will be below, reproducing is not difficult.
This error is notable for many reasons:
Ultimately, I gave up on these frameworks and implemented my own client over HTTP API.
My conclusion from this mess is: you can't trust the code under the hood of modern LLM frameworks. You need to double-check and proofread it. Just because they state that they are "production-ready" doesn't mean they are really production-ready.
Let me tell you more about the bug.
Recently, I unexpectedly encountered a justice system in the USA.
What conclusions can be drawn from this:
I continue developing my news reader: feeds.fun. To gather information and people together, I created several resources where you can discuss the project and find useful information:
So far, there is no one and nothing there, but over time, there will definitely be news and people.
If you are interested in this project, join! I'll be glad to see you and will try to respond quickly to all questions.
Recently OpenAI released GPT-4o-mini — a new flagship model for the cheap segment, as it were.
Of course, I immediately started migrating my news reader to this model.
In short, it's a cool replacement for GPT-3.5-turbo. I immediately replaced two LLM agents with one without changing prompts, reducing costs by a factor of 5 without losing quality.
However, then I started tuning the prompt to make it even cooler and began to encounter nuances. Let me tell you about them.
It's hard to impress me as a player and even harder as a game developer. The last time it happened with Owlcat Games in Pathfinder: Kingmaker, when they added a timer to the game's plot.
But Black Tabby Games managed to do it. And they did it not with some technological complexity but with a visual novel on a standard engine (RenPy), which is cool in itself.
I'll share a couple of thoughts about the game and its narrative structure, while I'm still under the impression. I need to think about how to adapt this approach to my projects.
ATTENTION: SPOILERS!
If you haven't played Slay The Princess yet, I strongly recommend you to catch up — the game takes 3-4 hours. You'll not regret it.