Feedback #1

github-classroom · 2024-11-08T09:40:54Z

👋! GitHub Classroom created this pull request as a place for your teacher to leave feedback on your work. It will update automatically. Don’t close or merge this pull request, unless you’re instructed to do so by your teacher.
In this pull request, your teacher can leave comments and feedback on your code. Click the Subscribe button to be notified if that happens.
Click the Files changed or Commits tab to see all of the changes pushed to the default branch since the assignment started. Your teacher can see this too.

Notes for teachers

Use this PR to leave feedback. Here are some tips:

Click the Files changed tab to see all of the changes pushed to the default branch since the assignment started. To leave comments on specific lines of code, put your cursor over a line of code and click the blue + (plus sign). To learn more about comments, read “Commenting on a pull request”.
Click the Commits tab to see the commits pushed to the default branch. Click a commit to see specific changes.
If you turned on autograding, then click the Checks tab to see the results.
This page is an overview. It shows commits, line comments, and general comments. You can leave a general comment below.
For more information about this pull request, read “Leaving assignment feedback in GitHub”.

Subscribed: @jin-jae @hskhyl @ssunbear @ocean010315 @minjijeong98 @wjddms4299 @gwaksital

fix: Align modularized code with the original baseline

docs: Add template files

…ibution feat: Add function to create uniform answer distribution

merge: Baseline reproduction done on develop branch

- url: http://contents.history.go.kr/front/ta/view.do?levelId=ta_h71_0010

- url: http://contents.history.go.kr/front/tg/main.do

…-Analysis GENNLP-9: feat: Add EDA script for data analysis

…d integrate unsloth

refactor/GENNLP-29-Unsloth

… using GPT API

…-generationfornlp-nlp-05-lv3 into feat/GENNLP-25-wikipedia

…-generationfornlp-nlp-05-lv3 into feat/GENNLP-26-korean-history

…-generationfornlp-nlp-05-lv3 into feat/GENNLP-27-openstax

…ostcampaitech7/level2-nlp-generationfornlp-nlp-05-lv3 into feat/preprocessing

…paitech7/level2-nlp-generationfornlp-nlp-05-lv3 into feat/preprocessing

[FEAT] 외부 데이터 수집 코드 추가

Jinsu-L · 2024-12-08T14:28:58Z

notebooks/demo_data_processing.ipynb

데모 데이터를 전처리하기 위한 코드로 main readme 파일에 설명을 넣어주셨지만, 해당 부분에 대하여 notebook으로 보다는 python 파일로 만들어서 demo 쪽으로 통합을 해도 괜찮지 않을까합니다.
노트북을 쓰는 이유가 개인적으로 시각화 목적이 1순위라고 생각해서 시각화가 없다면 일반 python으로 작성하는 것이 활용성면에서 더 좋지 않을가 합니다.

Jinsu-L · 2024-12-08T14:31:48Z

notebooks/demo_data_processing.ipynb

+    "transformed_data = []\n",
+    "\n",
+    "# DataFrame의 각 행을 순회하며 데이터 변환 수행\n",
+    "for idx, row in df.iterrows():\n",


iterrow를 통해서 한줄씩 dataframe을 읽고 새로운 dataframe을 만듬으로써 id나 중복 데이터를 만드는 방식의 경우 큰 데이터에서는 중복된 데이터로 많은 메모리를 쓰게 되어서 map, apply 같은 기능으로 데이터를 변환하는 방식을 사용하면 가독성도 높히고 메모리도 아낄 수 있을것 같습니다.

Jinsu-L · 2024-12-08T14:40:16Z

notebooks/ft_data_processing.ipynb

파일 이름이 ft_data_processing이여서 직관적이지 않은 것 같습니다.
또한 이름과 함께 주피터 노트북이라서 필수로 실행해야하지 않아도 되는 파일로 느껴져서 해당 부분의 코드를 일반 파이썬 파일로 하면서 프로젝트에서 src/preprocessing.py에 함께 있을 수 있지 않을까합니다.

Jinsu-L · 2024-12-08T14:40:29Z

notebooks/eda.ipynb

시각화 데이터를 잘 만들어 주셨는데, 노트북을 웹에서 업로드 하는 이유가 데이터 시각화와 함께 글을 남기는 거라고 생각을 해서 각 그래프에 대한 설명을 아래에 Markdown으로 내용을 정리해서 업로드를 하면 분석 결과를 해당 파일만으로도 파악이 더 잘되지 않을까합니다.

Jinsu-L · 2024-12-09T10:54:03Z

src/dataset.py

+            records.append(record)
+
+        # Convert processed records to a DataFrame
+        data_df = pd.DataFrame(records)


apply를 이용하는 것이 가독성이나 재사용성에서 좋을 것 같습니다.

Jinsu-L · 2024-12-09T10:58:08Z

src/ensemble.py

+    print("\n정답 개수 분포:")
+    print(final_df['answer'].value_counts().sort_index())
+
+def main():


csv_files랑 weights에 대하여 하드 코딩 된 부분을 main function 밖으로 빼거나 args를 받을 수 있으면 좋을 것 같아요

Jinsu-L · 2024-12-09T11:04:08Z

src/model.py

구조는 선호에 따라다르다고 생각하는데, 현재 model.py 안에 model train, tokenize, evaluation 기능등이 다 들어 있어서 trainer 같은데요. 해당 클래스 내 너무 많은 기능이 있어서 분리하는 것이 필요할 수 있을 듯합니다.

Jinsu-L · 2024-12-09T11:08:33Z

src/preprocessing.py

+
+
+# 사용 예시
+if __name__ == "__main__":


해당 코드가 외부 데이터를 가지고 와서 전처리를 하는 파이프라인 코드로 보이는데요.
기존 모델 학습 코드랑 한 폴더에 있어서 분리를 하고 해당 파일도 전처리 뿐만 아니라 크롤링 부터 해서 다양한 역할이 한 파일에 다 들어가 있어서 분리를 하는 것도 좋을 것 같습니다.

Jinsu-L · 2024-12-09T11:11:23Z

src/retrieval_dense.py

+        # List of retrieved documents
+        references = eval(row['reference'])
+
+        K = len(references)  # Number of retrieved documents (Top K)


각 방법론 마다 metric 계산하는 코드의 중복이 있는 듯 합니다. 함수로 묶어서 처리를 하면 좋을 것 같습니다.

Jinsu-L · 2024-12-09T11:12:21Z

src/retrieval_sparse.py

+        return []
+
+
+def evaluate_metrics_threshold(df, retriever):


여러 파일에 걸쳐서 중복되는 코드들이라서 evaluation_metric에 대하여 따로 나눠서 만들어도 좋을 것 같습니다.

add: translation for openstax vectorstore

github-classroom bot and others added 30 commits November 8, 2024 09:40

Setting up GitHub Classroom Feedback

d9082c0

feat: Modularized Baseline

d8f9da9

docs: Update README.md

90ec293

feat: Quantization

ed60b89

fix: 8-bit quantization config

65a7728

feat: Add function to create uniform answer distribution

d9f5d41

docs: Add template files

3d5c35f

fix: Align modularized code with the original baseline

1c0fe95

fix: Add torch_dtype in config

8b20078

docs: Update README.md

18d2417

Merge pull request #4 from boostcampaitech7/fix/baseline_template

887d585

fix: Align modularized code with the original baseline

Merge pull request #3 from boostcampaitech7/docs

0201918

docs: Add template files

Merge pull request #2 from boostcampaitech7/feat/uniform_answer_distr…

873340b

…ibution feat: Add function to create uniform answer distribution

fix: train.csv

079493a

Merge pull request #5 from boostcampaitech7/develop

9213a8b

merge: Baseline reproduction done on develop branch

fix: src/utils.py path

8875263

feat: Add wikipedia preprocessing function

dcc5c29

fix: Change context length filtering threshold (100 -> 50)

828b6c9

feat: Add crawling openstax textbook function

2a09669

GENNLP-9: feat: Add EDA script for data analysis

bb77e2b

feat: Add crawling code for korean history textbook

92914ed

- url: http://contents.history.go.kr/front/ta/view.do?levelId=ta_h71_0010

feat: Add crawling code for korean history terms

089e304

- url: http://contents.history.go.kr/front/tg/main.do

fix: Fix section name extraction error

75cdd88

Merge pull request #7 from boostcampaitech7/GENNLP-9-Exploratory-Data…

3388227

…-Analysis GENNLP-9: feat: Add EDA script for data analysis

GENNLP-29: refactor: Enable configurations changes via config.yaml an…

01c64de

…d integrate unsloth

Merge pull request #8 from boostcampaitech7/refactor/GENNLP-29-Unsloth

a29ea56

refactor/GENNLP-29-Unsloth

feat: Add gpt_api_template, enables data augmentation and other tasks…

f6978c9

… using GPT API

refactor: Change prompt configuration

e3c7669

fix: Makr dir checkpoint folder before assigning config.yaml

2ce92fe

add prompt_templates and modify config

8456d99

ssunbear and others added 24 commits December 2, 2024 16:54

Update README.md

e54cde7

Update README.md

3132dba

Update README.md

0f7ff07

Update README.md

bf0dddc

Update README.md

2ba9348

Update README.md

ebeeb26

Update README.md

492ea8a

Update README.md

eeeb1eb

Update README.md

54a338b

Add Demo

a4d4ad2

Add timeline image for README

c25d0c2

docs: Add timeline image to README

f977d10

Merge branch 'main' of https://github.com/boostcampaitech7/level2-nlp…

ae8943e

…-generationfornlp-nlp-05-lv3 into feat/GENNLP-25-wikipedia

feat: Add wikipedia preprocessing

2e68c60

Merge branch 'main' of https://github.com/boostcampaitech7/level2-nlp…

9064add

…-generationfornlp-nlp-05-lv3 into feat/GENNLP-26-korean-history

Merge branch 'main' of https://github.com/boostcampaitech7/level2-nlp…

eda2f1b

…-generationfornlp-nlp-05-lv3 into feat/GENNLP-27-openstax

Merge branch 'feat/GENNLP-26-korean-history' of https://github.com/bo…

6cfaa82

…ostcampaitech7/level2-nlp-generationfornlp-nlp-05-lv3 into feat/preprocessing

Merge branch 'feat/GENNLP-27-openstax' of https://github.com/boostcam…

29e354b

…paitech7/level2-nlp-generationfornlp-nlp-05-lv3 into feat/preprocessing

feat: Combine preprocessing and crawling codes

353a21a

Merge pull request #19 from boostcampaitech7/feat/preprocessing

e4d4b58

[FEAT] 외부 데이터 수집 코드 추가

Delete duplicated file

87ce95e

Delete duplicated file

79dc136

Add preprocessing code information in code structure

f2cbc36

add: translation for openstax vectorstore

225ee53

Jinsu-L reviewed Dec 8, 2024

View reviewed changes

Jinsu-L reviewed Dec 9, 2024

View reviewed changes

jin-jae added 2 commits December 30, 2024 04:02

Merge pull request #20 from boostcampaitech7/feat/GENNLP-27-openstax

503592c

add: translation for openstax vectorstore

fix: requirements dependency

51111e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback #1

Feedback #1

github-classroom bot commented Nov 8, 2024 •

edited

Loading

Jinsu-L Dec 8, 2024

Jinsu-L Dec 8, 2024

Jinsu-L Dec 8, 2024

Jinsu-L Dec 8, 2024

Jinsu-L Dec 9, 2024

Jinsu-L Dec 9, 2024

Jinsu-L Dec 9, 2024

Jinsu-L Dec 9, 2024

Jinsu-L Dec 9, 2024

Jinsu-L Dec 9, 2024

Feedback #1

Are you sure you want to change the base?

Feedback #1

Conversation

github-classroom bot commented Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-classroom bot commented Nov 8, 2024 •

edited

Loading