(bash?) matching Array of Arrays against a simple k:v json at ~200k lines?

𞋴𝛂𝛋𝛆@lemmy.world · edit-2 4 months ago

(bash?) matching Array of Arrays against a simple k:v json at ~200k lines?

INeedMana@piefed.zip · edit-2 4 months ago

If you want ᜴ᚣᜩᗕ to get resolved to jake then bash will be a pain to use. I would use python

For each ASCII letter create a list of non-ASCII characters that look similar. Then, for each word you want to match construct a regex

dictionary = { ...'j': ['j', 'J', '𝚥', '𞋕', ...

regex=''  
for letter in 'jake':  
regex += f'[{"".join(dictionary[letter])}]'

so after whole loop the regex would look a bit like (I’m cutting out a lot of characters to save on copy-pasting) [jJ𝚥𞋕][aA4@][kKᏦ🅺][eE3€]

>>> import re  
>>> r='[jJ𝚥𞋕][aA4@][kKᏦ🅺][eE3€]'  
>>> re.fullmatch(r, 'jake')  
<re.Match object; span=(0, 4), match='jake'>  
>>> re.fullmatch(r, 'joke')  
>>>

In general the group of problems that you are touching here is https://en.wikipedia.org/wiki/String_metric but I’m not sure if there is an algorithm that would be so “visual” matching

𞋴𝛂𝛋𝛆@lemmy.world · 4 months ago

Thanks. This was helpful.