03 - Spain #18

erwangranger · 2025-01-08T21:46:20Z

erwangranger
Jan 8, 2025
Maintainer

This is the thread where you can paste your notes based on the mission.

erwangranger · 2025-01-17T14:12:03Z

erwangranger
Jan 17, 2025
Maintainer Author

I tried with 0.5 , but that was silly. it should have been 0.3.

0 replies

camille-sophiie · 2025-01-17T14:12:59Z

camille-sophiie
Jan 17, 2025

make sure to pip install the required libraries

0 replies

syntaxsdev · 2025-01-17T14:13:59Z

syntaxsdev
Jan 17, 2025

My first thoughts was to just run the entire notebook to see what worked.

Clicking run caused packages that weren't installed to be identified, I didn't know how to access the terminal, so I did a !pip install xxx in a cell above.

Next error I set it to 0.3 because 30% of the data should be used for training

Then I continued to run through and read the comments.

I then showed plt.show_me_the_stars() to .show() to show the confusion matrix.

0 replies

NelsonLomboPaez · 2025-01-17T14:14:00Z

NelsonLomboPaez
Jan 17, 2025

Mission Spain

On the train model, we have the following error:

InvalidParameterError: The 'test_size' parameter of train_test_split must be a float in the range (0.0, 1.0), an int in the range [1, inf) or None. Got 1.5 instead.

since we are splitting the data like this:

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=1.5, random_state=42)

The first idea that came to my mind is to modify the test_size to 0,5 since it's the medium value between 0.0 and 1.0. That solved the issue but I need to investigate if this is the correct value.

With that value, I get a pretty decent model accuracy of:

Model Accuracy: 99.17%

On the evaluate model we are calling a function show_me_the_stars() which does not exist and I don't think it will be part of any library so I'm just commenting it for now.

0 replies

alpajain23 · 2025-01-17T14:15:19Z

alpajain23
Jan 17, 2025

I got error at step 15 while training the model. Upon online search, I reviewed this document - https://scikit-learn.org/1.5/modules/generated/sklearn.model_selection.train_test_split.html and made it 0.3

0 replies

thom-at-redhat · 2025-01-17T14:15:38Z

thom-at-redhat
Jan 17, 2025

Module not found: seaborn

Insert a cell above 1
!pip install seaborn

Pip not up to date

Insert a cell above (new) 1
!pip install --upgrade pip

test_size=1.5

Change to: 0.3

plt.show_me_the_stars()

Comment this out or
Change to: plt.show()

0 replies

hashnao · 2025-02-12T18:26:27Z

hashnao
Feb 12, 2025

ModuleNotFoundError: No module named 'seaborn'
InvalidParameterError: The 'test_size' parameter of train_test_split must be a float in the range (0.0, 1.0), an int in the range [1, inf) or None. Got 1.5 instead.
The error message clearly states that the issue is with the test_size parameter in the train_test_split function. You've provided the value 1.5, which is not within the acceptable range.
The error message indicates that the colormap 'Blue' is not recognized. The seaborn function sns.heatmap() expects a valid colormap name, but 'Blue' is not a standard name for a colormap in seaborn.

0 replies

BrianZachary · 2025-02-12T18:28:03Z

BrianZachary
Feb 12, 2025

Had to pip install seaborn
Had to fix test size to .3 from 1.5
Had to replace Blue with Blues in sns.heatmap function

Stretch goal - schedule pipeline run
Create pipeline in Jupyter interface
Drag notebook to pipeline (choose data science with python 3.11 as runtime image in properties)
Run pipeline from Jupyter
From the DataScience pipelines tab in RHOAI interface, select Action -> Create Schedule
Spanish CET is +1 UTC so schedule daily for 1AM UTC

0 replies

pbuehl-redhat · 2025-02-12T18:35:47Z

pbuehl-redhat
Feb 12, 2025

The hardest one for me was the pipelines, what I did first was create a requirements.txt within the same directory under .../lab-materials/03/ using the following command

pip freeze > requirements.txt

and then I clicked on run as pipeline and set * as the file extension value. Then a non recurring pipeline gets created. You will then need switch nonrecurring pipeline to daily and should work.

0 replies

lcardonag · 2025-02-12T19:31:59Z

lcardonag
Feb 12, 2025

Nedd to Install seaborn:

pip install seaborn

Correct the test size to 0.3 (30%)

Split the data into training and testing sets

30% of the data is used for testing, while the remaining 70% is used for training.

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3, random_state=42)

And the cmap from Blue to Blues

Visualize the confusion matrix

...
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")

0 replies

cvicens · 2025-02-12T19:41:06Z

cvicens
Feb 12, 2025
Collaborator

I used the chat to ask for a valid value of cmap.

0 replies

eartvit · 2025-02-12T21:56:50Z

eartvit
Feb 12, 2025

The requirement in the lab mentions to get the precision and accuracy above 98%, however in the notebook only the accuracy is tested and verified to be above 98% if the test/train split is 30/70. The precision is calculated with a different fomula out of the confusion matrix, precision=TP/TP+FP which yields to only 95%. I am not sure that with LogisticRegression can solve this, a different model might be required, like RF or XGBoost or even an MLP based architecture...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

03 - Spain #18

{{title}}

Replies: 12 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

03 - Spain #18

erwangranger Jan 8, 2025 Maintainer

Replies: 12 comments

erwangranger Jan 17, 2025 Maintainer Author

Split the data into training and testing sets

30% of the data is used for testing, while the remaining 70% is used for training.

Visualize the confusion matrix

cvicens Feb 12, 2025 Collaborator

erwangranger
Jan 8, 2025
Maintainer

erwangranger
Jan 17, 2025
Maintainer Author

cvicens
Feb 12, 2025
Collaborator