How would you design parallel grep for huge JSONL files?

dhruv3006@lemmy.world · 2 months ago

Eager Eagle@lemmy.world · 2 months ago

How many grep-like ops per file?
Is it interactive or run by another process?
Do you know which files ahead of time?
Do you have any control over that file creation?
Is the JSONL append only? Is the grep running while the file is modified?
How large is very large? 100s of MB? Few GB? 100s of GB? Whether or not it fits in memory could change the approach.
You’re using files, plural, would parallelizing at the file level (e.g. one thread per file) be enough?
How many files and how often is that executed?

dhruv3006@lemmy.world · 2 months ago

100s of GBs yes.