Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search page ranking #4

Open
mikelove opened this issue Sep 15, 2014 · 15 comments
Open

search page ranking #4

mikelove opened this issue Sep 15, 2014 · 15 comments

Comments

@mikelove
Copy link

I can't tell what the ranking here is based on

https://support.bioconductor.org/local/search/page/?q=

it doesn't appear to be reverse chronological, though that would be the best default ranking IMO

@dtenenba
Copy link

For me that url does not show any posts.

@dtenenba
Copy link

However, if you go to

https://support.bioconductor.org/t/Latest/

You'll see that the default sort is "new answers". Open that dropdown for other possibilities. Does that help?

@mikelove
Copy link
Author

oh sorry, i gave a bad example. I meant like this one:

https://support.bioconductor.org/local/search/page/?q=DESeq2

@hpages
Copy link

hpages commented Sep 17, 2014

Related to the search, it doesn't seem to be looking at the titles of the threads e.g.

https://support.bioconductor.org/local/search/page/?q=Bioconductor+2.14+is+released

doesn't find the post with subject "Bioconductor 2.14 is released".

@dtenenba
Copy link

@ialbert do you know what's going on here? We're using the haystack search if that matters. Thanks.

@ialbert
Copy link

ialbert commented Sep 17, 2014

words are interpreted individually and not as a phrase,

haystack and the search engine that you connect to behind a scenes are quite complex beasts that I am sure could be configured in every which way possible, but we did not have the manpower yet to do it, so the search is by relevance which is again that depends on the engine

@hpages
Copy link

hpages commented Sep 17, 2014

Yes I guess it's pretty clear that words are interpreted individually, given how the individual words get highlighted in the bodies of the results. My concern is that nothing gets highlighted in the titles of the posts, suggesting that the titles are not being searched. It would make sense that the titles are searched before the bodies.

@hpages
Copy link

hpages commented Sep 17, 2014

Also searching for "3.1.1"

https://support.bioconductor.org/local/search/page/?q=3.1.1

or for "v3.1.1"

https://support.bioconductor.org/local/search/page/?q=v3.1.1

doesn't find the "R package not available for R v3.1.1" thread:

https://support.bioconductor.org/p/61052/

@ialbert
Copy link

ialbert commented Sep 17, 2014

First always look at the right sidebar, if you see the words: "Nothing
matches yet" in the Similar Posts tab then it almost always means that this
post has not been indexed. These posts will not show up in search results.

The default celery task will index new posts every 15 minutes so new posts
may not show up right away. But this seems to be an older post maybe the
index should be refreshed, this can be done from command line:

./biostar.sh index

this will recreate the search index.

Now there is also a more general answer as to how the search works.

Biostar does not actually perform the search, it passes down the query
into a third party engine that runs behind the scenes. There is support for
many different search engines.

But then usually the way the search is then implemented and performed by
the engine can be customized it so many ways that it is a task on its own.
For example here is elastic search:

http://www.elasticsearch.org/

I never really had time to dwell into all the details of word stemming,
capitalization, punctuation etc. so I right now we just take their default
behavior and run with that. Customizing the search can be done
independently of Biostar, it all depends on the schema and custom
parameters to the engine.

On Wed, Sep 17, 2014 at 3:22 PM, hpages [email protected] wrote:

Also searching for "3.1.1"

https://support.bioconductor.org/local/search/page/?q=3.1.1

or for "v3.1.1"

https://support.bioconductor.org/local/search/page/?q=v3.1.1

doesn't find the "R package not available for R v3.1.1" thread:

https://support.bioconductor.org/p/61052/


Reply to this email directly or view it on GitHub
Bioconductor/support.bioconductor.org#4 (comment)
.

@ialbert
Copy link

ialbert commented Sep 17, 2014

also I do agree that good search is essential and should be a priority, we'll make it one

@ialbert
Copy link

ialbert commented Sep 17, 2014

another detail I forgot to mention. Titles are searched for posts that have titles (top level posts) but the title is not treated in any special way. Basically when there is a title it is treated as if it were the first line of the post.

Now it won't highlight the title in the link since that comes from a different source.

For example see:

https://support.bioconductor.org/local/search/page/?q=diffbind

the first hits shows situations where the the title is actually searched and shown as being the first line of the post.

@hpages
Copy link

hpages commented Sep 18, 2014

I see. Thanks for explaining. All that seems a little bit weird and counter-intuitive to me though. I wonder if there is any technical reason why the result of a search couldn't just be displayed like the list of posts I get when I click on a user name. Like here:

https://support.bioconductor.org/u/2360/

After all that list is also the result of a search ("search all the posts from that user"). Having the nb of votes/answers/views, plus the bottom line with tags and stuff like "written 10 hours ago by Janet Young • 680 • updated 10 hours ago by Martin Morgan ♦♦ 14k" is really great. Having the search terms highlighted in the title and body plus the ability to sort by relevance or reverse chronological would be really neat. Thanks!

@ialbert
Copy link

ialbert commented Sep 18, 2014

Search is a relatively new component. We used to rely on Google Domain
search but that will start inserting ads on more popular sites. The new
search engine is a feature that went live with Biostar 2.0 about six months
ago. So it is still in its early stage of understanding how to best make
use of it.

Having search results formatted the same way as the contribution posts is a
good idea that I haven't considered, Perhaps I ended up being too focused
on creating a different representation of each post because the engines are
indeed very powerful allow for all kinds of weighting, highlighting and
parsing schemes. But consistency would be better.

There is no limitation of why it couldn't look the same, some minor
complexities perhaps in that how paging and rendering will works since the
index is not a drop in replacement for the database table. (actually it
could probably be made to emulate more closely the database).

Improving the search will be our next focus area.

On Thu, Sep 18, 2014 at 2:16 AM, hpages [email protected] wrote:

I see. Thanks for explaining. All that seems a little bit weird and
counter-intuitive to me though. I wonder if there is any technical reason
why the result of a search couldn't just be displayed like the list of
posts I get when I click on a user name. Like here:

https://support.bioconductor.org/u/2360/

After all that list is also the result of a search ("search all the posts
from that user"). Having the nb of votes/answers/views, plus the bottom
line with tags and stuff like "written 10 hours ago by Janet Young • 680 •
updated 10 hours ago by Martin Morgan ♦♦ 14k" is really great. Having the
search terms highlighted in the title and body plus the ability to sort by
relevance or reverse chronological would be really neat. Thanks!


Reply to this email directly or view it on GitHub
Bioconductor/support.bioconductor.org#4 (comment)
.

@hpages
Copy link

hpages commented Sep 18, 2014

I see. Thanks for the extra details and for providing some background. But most importantly, thanks for the Biostar software!

@mikelove
Copy link
Author

yes, let me echo: many thanks to Istvan for the open source of the Biostar software, and to Dan and Marc as well. As a developer, this new support site is really useful for communicating with users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants