Tried revisiting Ludwig (my Lemmy clone project) today. I've been experimenting with Cursor and AI-assisted development--yes, yes, I know, but it's actually nice for speeding up boilerplate tasks! And it's surprisingly good for C++... so long as you never trust it with memory management.

One thing I left unfinished was full-text search. I had a cobbled-together homegrown LMDB-based search index with sentencepiece as a tokenizer, but it barely works. So I decided to find a C++ embedded search library. And the pickings are slim.

First choice was CLucene, which sort of works. Cursor helped me figure out the barely-documented API, but also generated a bunch of use-after-frees that I had to sort out. CLucene is 15 years old and kind of works, but it also leaks memory like crazy and I can't find any way to fix it. Asan thinks the leaks are coming from within CLucene, so it's probably not my code?

I tried another over-a-decade-old project, Zettair (formerly Lucy). Cursor could translate the autotools build files to Meson, and it worked on the first try, nice! But Zettair can only index files, not in-memory strings...

What else is there? Xapian is GPL, and I want to keep the project Apache-2.0 licensed, so that's out. Pisa also can only load files and can't add new entries while running. Rust libraries like Tantivy would massively bloat the binary.

As a last resort, I started vibe-coding a translation of Sonic (a very cool Rust search engine that sadly can't be embedded) into a C library, and it didn't take too long to get something working! But it's still more yak shaving. I don't need it, I don't need it...

#Lemmy#Ludwig#Cpp#Rust#CLucene#Cursor

Noé Lopez
Noé Lopez boosted

My #Guix patch #1229[1] to add Sourcetrail has been merged!! Thank you to @baleine and @sharlatan for reviewing and merging, respectively 😊
Substitutes are available now, so give it a go if you're interested! It's a really nifty tool for exploring C/C++ projects.
[1] https://codeberg.org/guix/guix/pulls/1229
#programming #c #cpp