Make The Website Searchable
No due date
25% complete
Det hade varit coolt och användbart att kunna söka på den nya hemsidan. Kunna söka på nyheter, events, eller kanske till och med i våra styrdokument.
Ish planen:
1. Content Analysis
- Website Content: Go through and create records of the informations in the databases in the backend. For example, news, events etc.
- PDF Documents: List all PDFs available on t…
Det hade varit coolt och användbart att kunna söka på den nya hemsidan. Kunna söka på nyheter, events, eller kanske till och med i våra styrdokument.
Ish planen:
1. Content Analysis
- Website Content: Go through and create records of the informations in the databases in the backend. For example, news, events etc.
- PDF Documents: List all PDFs available on the site that need to be searchable.
2. Text Extraction
- PDF Parsing: Utilize libraries such as PyPDF2 to extract text from PDFs.
3. Indexing
- Search Engine Selection:
- Elasticsearch: A distributed, open-source search and analytics engine.
- Algolia: An alternative to do this, free to 1 M records and 10 000 searches.
- Data Structuring: Organize extracted text into a structured format (records) suitable for indexing.
4. Backend Integration
• API Development: Create endpoints to handle search queries and return results.
• Data Synchronization: Ensure the search index is updated with new or modified content.
Additional tips:
- Highlight Matching Terms: Many search engines (like Elasticsearch) provide options to highlight matching terms in the results.
- Pagination: Add pagination to avoid loading all results at once.
Updates:
** 11/11 2024 **
I have decided to try to use elastic search for this implementation.