Coder Social home page Coder Social logo

Comments (10)

squidfunk avatar squidfunk commented on June 15, 2024

Thanks for reporting. I've ran your reproduction and can confirm that when no hyphen is used as the search separator, nothing is found. This is very, very likely related to #6885 (reply in thread) (item 2.) and not fixable at the moment for the reasons stated in that comment. However, you might have noticed that we're working on #6307, which will fix this issue as well. I've also ran our latest search preview (#6372) and it fixes the issue, allowing to search with or without -:

Bildschirm­foto 2024-03-24 um 11 49 40

If I, as you mentioned, switch to the what you call "original" separator in our current implementation, I can confirm that search works and I do not observe the item being rendered as the second result:

Bildschirm­foto 2024-03-24 um 11 45 10

Note that we're working heavily on improving search result ranking as well, which should also be better in #6372. Until then, we're considering this issue as resolvable with a configuration (separator) change. You can follow #6307 for updates on the new search implementation, which should fix many, many shortcomings of the current implementation.

from mkdocs-material.

squidfunk avatar squidfunk commented on June 15, 2024

On another note:

On our system, the performance is 20-30 seconds to actually find the pages.

Is the performance the same if you use the search preview (#6372)? How many pages is your documentation composed of? How long does the build take? Searching should not take 20-30s but 20-30ms, and you can help us trying to understand where this comes from by providing us with more information, and ideally, with a test case. Are your docs public?

from mkdocs-material.

squidfunk avatar squidfunk commented on June 15, 2024

Alternatively, if you could share the search/search_index.json file that is located in your site directory after building – that would be a tremendous help. It is public anyway if you deploy your site to GitHub Pages. You can just post the link here, as it would help me better understand what the problem is. If you could also provide some searches that lead to suboptimal results on that dataset, that'd be absolutely amazing and of great help ☺️

from mkdocs-material.

galthaus avatar galthaus commented on June 15, 2024

@squidfunk - Our docs can 200+ pages. We've split the site into two, but it is still a lot. The build of both sites can take 45 minutes. The problem with the "original" search delimiter is not that things aren't found, but they are biased in the current mechanism to push the set of items down that match the "whole" string. So, universal-image-deploy finds universal and image and deploy and image deploy before universal-image-deploy and that is really annoying. The ordering problem becomes more apparent with lots of pages.

docs.rackn.io is our current site. https://docs.rackn.io/stable/ search universal-hardware - it takes about 3 seconds for the preview window to stabilize. I think the longer times are on slower links and maybe first search.

We are using an older version because I need to figure out how to get the latest to work. I have hacked our docs to make it work for the reproduction case. The current builds fail on our tree because the tag system now seems to not be able to consume tags with hyphens in them. I'll see if I can make a case for that.

Thanks for your feedback. I'll see if I can try the preview.

from mkdocs-material.

squidfunk avatar squidfunk commented on June 15, 2024

Our docs can 200+ pages. We've split the site into two, but it is still a lot. The build of both sites can take 45 minutes.

45 minutes are definitely unexpected. Material for MkDocs own documentation has more than 90 pages and takes 4 seconds to build. It may be caused by some third party plugin or extension you're using. It'd be definitely worth debugging what causes this. A good idea is to disable plugins and extensions one-by-one and see what causes this.

The problem with the "original" search delimiter is not that things aren't found, but they are biased in the current mechanism to push the set of items down that match the "whole" string. So, universal-image-deploy finds universal and image and deploy and image deploy before universal-image-deploy and that is really annoying. The ordering problem becomes more apparent with lots of pages.

Yes, ranking is currently not optimal. The existing implementation is based on BM25, which is not ideal for typeahead. The search preview uses a variant of BM25 giving more weight to consecutive matches, so it might already improve the situation. We're working hard on a new ranking method that does not suffer from the problems of BM25.

docs.rackn.io is our current site. https://docs.rackn.io/stable/ search universal-hardware - it takes about 3 seconds for the preview window to stabilize. I think the longer times are on slower links and maybe first search.

The search feels reasonably snappy to me. Yes, it could be even faster (and the search preview actually should be), but I don't observe that opening the search modal or searching takes 3 seconds. I'll download your search index and check if I somehow run into pathological cases.

We are using an older version because I need to figure out how to get the latest to work. I have hacked our docs to make it work for the reproduction case. The current builds fail on our tree because the tag system now seems to not be able to consume tags with hyphens in them. I'll see if I can make a case for that.

Jup, 9.2.3 is a little old, but there have not been many changes to search, so don't expect too much when upgrading. However, as mentioned, following #6307 is a good idea, which will improve the situation. Regardless, it's always a good idea to try and stay updated, since we're iterating fast while trying to keep it as stable as possible.

The current builds fail on our tree because the tag system now seems to not be able to consume tags with hyphens in them.

The tags plugin in Insiders got a complete makeover, as discussed in #6517. If you can narrow the problems down and create a reproduction, we'd be happy if you can create a new bug report so we can fix it ☺️

from mkdocs-material.

galthaus avatar galthaus commented on June 15, 2024

Sorry. The dev scope we limit to 600 pages for build times. The 600 pages builds in about 21.68 seconds. The full scope of generated docs is 6000 pages. That takes a while to build, 1165.73 seconds. It appears that mkdocs is faster with the last builds. Still not fun, but getting better. I'll play with plugins and get you a repro on the tags things. Opened an insiders ticket for the tag build issue.

from mkdocs-material.

galthaus avatar galthaus commented on June 15, 2024

Here is the slower site. It has 6000 pages. https://refs.rackn.io/stable and search using the preview for universal-hardware or universal-discover. It appears to take 20 seconds to stabilize. The latest tree (but not the search rewrite) is faster, but still takes 10 seconds or so to stabilize. It flashes through sequences. My guess is that it threaded and is processing the keystrokes and bounces. The latest tree does sort better (well a little). It depends upon the search term.

from mkdocs-material.

squidfunk avatar squidfunk commented on June 15, 2024

Here is the slower site. It has 6000 pages.

6,000 pages is a whole other level, so it sounds legit that this takes longer. Just as an idea to cut down on build time: you might try to enable navigation pruning, which, depending on how you structured the site, might help in cutting down the size and time of the build, because the navigation plays a large role. Also see #1887 for reference.

Thanks for opening the ticket, we'll look into it.

I'm not surprised. Your search index is 40 MB, so you pretty much reached the end of client-side search, as you're shipping this index to every user. We haven't announced this yet, but we'll likely be offering the ability to provide server-side search and fully integrate it with the search interface in the near future. Additionally, we'll be exploring alternative methods of breaking down the index in order to ship smaller chunks to the user, and not the entire thing. A site of this size is just not suited anymore for full client-side search.

To sum up: we are very aware of the problem that with a growing site, search degrades, and will actively address this in the future after the shipped the first iterations of the new search interface. Our vision is to provide an awesome experience from 1 to 10,000 pages. Please note that this is a pretty big fish to fry, but we're working hard on it.

Bildschirm­foto 2024-03-26 um 08 44 35

Based on this search index, could you share some searches + the results you would expect and how they are sorted? That would allow us to better test it.

from mkdocs-material.

squidfunk avatar squidfunk commented on June 15, 2024

Thanks again for sharing your site. It helps a lot in gravitating towards a better search implementation ☺️

When I run my current prototype on the 40 MB search index of your site, indexing takes around 2-3 seconds and searching takes less than 100ms on average, which includes searching, ranking (please ignore score = 0 in the video below), ordering, highlighting and pagination. It looks very promising and feels quite snappy, given that there are 6,000 documents, each of which with multiple sections, leading to a total number of 16,000 items in the search index

Ohne.Titel.mp4

When entering a few characters, many, many results are returned, which might bury what you actually search for among many similar results. In this case, a scoped search might be a better idea, in order to prune the number of potential results prior to searching by a categorical system like tags or site subsections (Blog, Reference, etc.).

All of this is currently in movement, and I'll be regularly testing your search index. Please note that a search index with 16,000 items is far, far beyond what we've yet observed in a site, so it might take some time to get this right, but I can assure you that it is on our agenda.

Edit: prior comment said 26,000, but it's 16,000 items and 26,000 distinct terms. Sorry for that. It's, however, still the biggest search index we've seen so far.

from mkdocs-material.

galthaus avatar galthaus commented on June 15, 2024

Glad it could help. I'll look at navigation pruning.

from mkdocs-material.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.