• 0 Posts
  • 8 Comments
Joined 1 year ago
cake
Cake day: August 8th, 2023

help-circle

  • Did anybody bother to look at the numbers?

    I checked the stats for the last 4 years here and it looks really strange. Statistics isn’t my thing… But it looks like it’s wise to be cautious and not to fully trust the numbers.

    Around the beginning of last year there was a huge dip in the Windows market share that seemed to be correlating with a peek in “unknown”. Windows then catched up in a somewhat erratic way.

    Mac OS also shows a weird behavior. Starts at 16%, up to 21% and the down to 14% between October and November…

    It’s not likely that a huge number of people decided to buy a Mac and then trash it one month later. Same but opposite goes for the windows stats.

    I think it looks like there is an uncertainty of more than the total market share Linux is shown to have…

    Not saying that Linux isn’t increasing on desktop market share. Just saying that numbers seen to have quite a bit error margin and to be cautious if referring to these numbers.



  • I recommend you shrink the windows partition on the internal drive and install Linux in the then empty space. The extra disk you have can be used as and extra disk or you can create mount points for /home and other directories.

    Microsoft does not recognize other operating systems as “equals” (WSL is not Linux being week. It’s making Linux a puppet controlled by Windows) and therefore they design everything Windows as it was the only OS in the world. Therefore keeping Windows will often require some extra acrobatics from you.



  • How good is good do you say?

    We got a pretty good results with CER at 4% and WER at 15%!

    This was on a limited dataset used to test and train which most likely means that if you introduced an even larger dataset with greater variations in handwriting style for testing the numbers might be even worse.

    Very simplified: A risk of a character wrong every 20th character and a word wrong every 7th word. The SER was around 20%.

    There’s an reason why no one has released a good model for western letters yet and why companies pay up to 1€ for capturing data from 10 handwritten pages.

    It will come but OCR isn’t as sexy as developing text2image solutions.


  • To train an AI to recognize handwriting you need a huge dataset of handwriting examples. That is millions of samples of handwritten text + information about what the written text says in every example).

    This is why the best engines only exists as a service in the cloud. The OCR engines you can install lovely that are acceptable, but far from perfect, are commercial. Parascript FormXtra is one of the better commercial ones.

    The only OCR Engine that’s free and really good is Tesseract OCR but it doesn’t handle handwritten text.