activistPnk@slrpnk.net

activistPnk@slrpnk.net

The very first task grep was created for was to search text files containing natural language – not code. So from the very beginning they screwed up by making newlines the delimiter. Sure, they had hardware limitations back then… probably 64k RAM. But obviously such limitation is no longer realistic.

For searching for a word pair or phrase, grep and pdfgrep both miss situations where a linebreak falls within the phrase.

We need a tool that caters for searches on natural language. All PDFs and most text files would be applicable. Inspiration should come from this page:

https://libguides.law.drake.edu/LexisWest

Notice those costly commerical tools enable users to specify a query that matches any 2+ patterns that occur in the same sentence or paragraph. You can specify a max distance in terms of number of words. You can specify the word order or specify that word order does not matter.

I have to say it’s a bit baffling that such a basic need is still unmet in the FOSS world, unless I’m missing something. AFAIK, all we have are hacks.

grep and pdfgrep need to evolve or be replaced with something that can ignore line breaks and search sentences and paragraphs

grep and pdfgrep need to evolve or be replaced with something that can ignore line breaks and search sentences and paragraphs