Skip to content

Make The Website Searchable

No due date 25% complete

Det hade varit coolt och användbart att kunna söka på den nya hemsidan. Kunna söka på nyheter, events, eller kanske till och med i våra styrdokument.

Ish planen:

1. Content Analysis

  • Website Content: Go through and create records of the informations in the databases in the backend. For example, news, events etc.
  • PDF Documents: List all PDFs available on t…

Det hade varit coolt och användbart att kunna söka på den nya hemsidan. Kunna söka på nyheter, events, eller kanske till och med i våra styrdokument.

Ish planen:

1. Content Analysis

  • Website Content: Go through and create records of the informations in the databases in the backend. For example, news, events etc.
  • PDF Documents: List all PDFs available on the site that need to be searchable.

2. Text Extraction

  • PDF Parsing: Utilize libraries such as PyPDF2 to extract text from PDFs.

3. Indexing

  • Search Engine Selection:
    • Elasticsearch: A distributed, open-source search and analytics engine.
    • Algolia: An alternative to do this, free to 1 M records and 10 000 searches.
  • Data Structuring: Organize extracted text into a structured format (records) suitable for indexing.

4. Backend Integration

API Development: Create endpoints to handle search queries and return results.

Data Synchronization: Ensure the search index is updated with new or modified content.

Additional tips:

  • Highlight Matching Terms: Many search engines (like Elasticsearch) provide options to highlight matching terms in the results.
  • Pagination: Add pagination to avoid loading all results at once.

Updates:

** 11/11 2024 **
I have decided to try to use elastic search for this implementation.

Loading