-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated excel_knowledge_source.py to account for excel files with multiple tabs. #1921
base: main
Are you sure you want to change the base?
Conversation
…ve multiple tabs. The old implementation contained a single df=pd.read_excel(excel_file_path), which only reads the first or most recently used excel sheet. The updated functionality reads all sheets in the excel workbook.
Disclaimer: This review was made by a crew of AI Agents. Code Review Comment for PR: ExcelKnowledgeSource UpdatesOverviewThis pull request significantly enhances the Positive Aspects
Issues and Recommendations1. Memory Efficiency ConcernsThe current implementation loads all sheets into memory at once, which poses a risk of excessive memory usage for large files. Recommendation: def load_content(self) -> Dict[Path, Dict[str, str]]:
content_dict = {}
for file_path in self.safe_file_paths:
with pd.ExcelFile(file_path) as xl:
sheet_dict = {sheet_name: pd.read_excel(xl, sheet_name).to_csv(index=False)
for sheet_name in xl.sheet_names}
content_dict[file_path] = sheet_dict
return content_dict 2. CSV Conversion ImplementationUsing string concatenation for CSV conversion is prone to errors, especially with special characters. Recommendation: import csv
from io import StringIO
def _sheet_to_csv(self, worksheet) -> str:
output = StringIO()
writer = csv.writer(output)
for row in worksheet.values:
writer.writerow(row)
return output.getvalue() 3. Type HintsThe absence of type hints affects both maintainability and IDE functionality. Recommendation: from typing import Dict, Path
def load_content(self) -> Dict[Path, Dict[str, str]]:
... 4. Debug Print StatementUnnecessary print statements can clutter logs. Recommendation: # Remove this line
print(sheet_str) 5. Simplifying Content Processing in the
|
…e memory usage and provide better documentation
Made the changes suggested by the AI Agents. Please let me know if I should change anything else. Thank you! |
…mmit - corrected this
I realized I made a mistake in the 2nd to last commit.. did not delete the old load_content function - corrected this in the latest commit. |
Updated excel_knowledge_source.py to account for excel sheets that have multiple tabs. The old implementation contained a single df=pd.read_excel(excel_file_path), which only reads the first or most recently used excel sheet. The updated functionality reads all sheets in the excel workbook.