I'm a final year Computer Science & Engineering undergraduate of University of Moratuwa specializing in Data Science & Engineering. With a passion for building web applications and exploring AI technologies, I'm currently working on building a state-of-the-art end-to-end RAG system for long context financial documents, i.e. Annual Reports.
Note: Jupyter Notebooks have been excluded as they accounted for over 90% of my "most used languages," which is clearly not the case. Here's why:
- GitHub determines language usage stats using Linguist, which calculates language usage based on file size rather than the actual number of lines of code.
- Jupyter Notebook files (
.ipynb
) are JSON-based and often include significant metadata, such as:- Code and outputs.
- Visualizations (e.g., plots, tables).
- Large embedded data or images (encoded in base64).
- This results in a file size bias, making Jupyter Notebooks appear disproportionately prominent in language stats.