Mynij - search faster, offline

Mynij Milestones 1 and 2: what is the limit of flexsearch?

We already know that we can have in Mynij an index of all bee species, of which there are 20 thousand. But what about the 150 million different references in Amazon. Can they fit inside Mynij, ie. inside the web browser's RAM and storage on a typical smartphone?
  • Last Update:2021-03-01
  • Version:001
  • Language:en

Mynij is an experimental offline Web search engine based on JIO and Flexsearch. The long term goal of Mynij is to provide relevant and accurate search results faster than traditional online Web search engines by relying for search on indices that are personalised for each user.

The goal of this milestone was to write a test and build a test environment which can automatically compute disk and RAM occupation, as well as time of execution, depending on the number of entries added to Mynij search engine. With these results, we can determine the kind of data we can expect Mynij to index. For example, we already know that we can have in Mynij an index of all bee species, of which there are 20 thousand. But what about the 150 million different references in Amazon. Can they fit inside Mynij, ie. inside the web browser's RAM and storage on a typical smartphone?

All results can be found online here: https://alpha.iodide.io/notebooks/3633/?viewMode=report

The source code to produce results can be found here: https://lab.nexedi.com/ARogova/Mynij-unit-tests

We also did some tests of the RSS and Sitemap parsers which Mynij relies on to index web sites: https://alpha.iodide.io/notebooks/3900/?viewMode=report

The simplified conclusion is that: with current implementation of flexsearch, it is possible with Mynij to store a 100K to 300K entries per index inside a smartphone. It is also possible to import / export about 100K entries in a matter of seconds to minutes. Within 10 years, we can expect these figures grow up to a million entries, and even maybe 10 million.

This is enough for all bee speicies but not enough for all references of Amazon.