This project is a Streamlit-based application designed to process banking files using AI (a Naive Bayes classifier).
It is fully functional but also highly adaptable and easy to extend to meet individual requirements.
-
Install Dependencies:
The primary dependencies are listed inrequirements.in
. To generate the actualrequirements.txt
file:pip-compile requirements.in
This command resolves and includes all secondary dependencies, ensuring compatibility.
-
Set Up Environment:
-
Create a new virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
- On Windows:
-
Install the dependencies:
pip install -r requirements.txt
-
-
Run the Application:
Start the Streamlit application with:streamlit run app.py
The application uses a local SQLite3 database to manage data.
- Database Initialization: On the first run, the database is automatically created in the root directory alongside
app.py
, if it doesn't already exist. - External Integration: The SQLite3 database can be connected to external reporting tools like Power BI for additional analysis and visualization.
All core data processing logic resides in the backend.
- API Design: While this version uses an SQLite3 API for local data storage, the structure mimics the Google Cloud API.
- Cloud Compatibility: This local version is a simplified adaptation of a larger application designed for Google Cloud Run and BigQuery. The local setup allows users to run the app in their environment with minimal modifications.
Adding Pages:
Streamlit’s modular nature makes it easy to add new pages. Simply follow the structure in app.py
and create additional APIs as needed.
This project represents the fourth iteration of my banking file processor and I have tested multiple different solutions.
- Model Selection: After experimenting with various algorithms, Naive Bayes proved the most effective for this application.
- It performs well with limited and highly variable data, which is typical for banking transaction files.
- Alternatives such as Random Forests, Boosted Trees, and Neural Networks with more advanced Features showed minimal benefits while adding unnecessary complexity.
To test the actual model and to see what it actually does, try to run ml_test.ipynb
with included test-data.csv or use your own.
Feel free to fork, modify, and extend this project for your own use!