-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ajout de nouveaux modèles de fraude #2792
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis pull request introduces several configuration, formatting, and feature enhancements. SQLFluff now includes new indentation settings, and a new incremental DBT model is added to analyze simultaneous group trips with defined materialization and CTE logic. Minor formatting adjustments were made to improve SQL readability in an existing fraud model. Additionally, a new table entry for carpool sources is introduced, a pre-commit hook configuration is updated, and various analytical code cells are added to a Jupyter notebook. Changes
Suggested reviewers
Poem
Tip 🌐 Web search-backed reviews and chat
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (1)
🔇 Additional comments (15)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
dbt/models/sources/carpool.yml (1)
222-223
: Add column definitions for terms_violation_error_labels table.The new table is added but lacks column definitions. This could make it harder for other developers to understand the schema.
Add column definitions similar to other tables:
- name: terms_violation_error_labels columns: - name: _id data_type: integer - name: carpool_id data_type: integer - name: labels data_type: jsonb
🧹 Nitpick comments (3)
dbt/models/fraud/trips/simultaneous_group_trips.sql (3)
25-45
: Consider more descriptive column aliases.The logic is correct, but column names could be more descriptive:
total_amount
could betotal_incentive_amount
num_incitations
could benum_incentives
for consistency with English termsselect carpool_id, - sum(i.amount) as total_amount, - count(distinct i._id) as num_incitations + sum(i.amount) as total_incentive_amount, + count(distinct i._id) as num_incentives
47-115
: Add documentation for fraud detection logic.The groups CTE implements complex fraud detection logic but would benefit from:
- SQL comments explaining the significance of metrics
- Documentation for the threshold of 3 journeys
- Consider parameterizing the journey threshold for flexibility
groups as ( + -- Identify potential fraud by grouping similar journeys + -- Metrics include revenue analysis, fraud status, and terms violations select g.start_geo_code, g.end_geo_code, c.start_datetime, ( c.distance / 50 )::int as distance_50m, count( * ) as num_journeys, ... having - count(*) >= 3 + count(*) >= {{ var('min_suspicious_journeys', 3) }} -- Minimum journeys to consider as suspicious group )
117-133
: Consider including fraud_labels in output.The fraud_labels field is aggregated in the groups CTE but not included in the final output. This field could provide valuable context for fraud analysis.
select start_geo_code, end_geo_code, start_datetime, distance_50m, num_journeys, total_driver_revenue, total_passengers_contributions, total_incentives, num_incitations, operators, num_fraud_journeys, num_anomaly_journeys, - has_fraud_labels + has_fraud_labels, + fraud_labels
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
dbt/.sqlfluff
(1 hunks)dbt/models/fraud/trips/simultaneous_group_trips.sql
(1 hunks)dbt/models/fraud/users/users_intraday_role_change_stats.sql
(1 hunks)dbt/models/sources/carpool.yml
(1 hunks)notebooks/.pre-commit-config.yaml
(1 hunks)notebooks/analytics/cee/cee_analysis.ipynb
(89 hunks)
✅ Files skipped from review due to trivial changes (1)
- dbt/models/fraud/users/users_intraday_role_change_stats.sql
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: CodeQL (javascript)
- GitHub Check: Test integration
🔇 Additional comments (92)
notebooks/analytics/cee/cee_analysis.ipynb (89)
8-23
: No functional changes detected.
44-62
: Ensure environment variable handling.
We see usage ofDB_URL_RPC
. Consider verifying that it's set, and handle missing or invalid values gracefully to avoid runtime errors.
69-80
: Verify import of Path.
We seeOUTPUT_PATH = Path("outputs")
; confirm thatPath
(likely frompathlib
) is imported above or in a hidden cell.
86-99
: Check for SQLAlchemy imports.
You're usingcreate_engine(DATABASE_URL)
; ensurefrom sqlalchemy import create_engine
is available.
115-133
: Purely parameter-related lines.
No functional code change present.
173-173
: Comment only.
445-445
: Comment only.
503-503
: Markdown heading only.
574-574
: Markdown heading only.
768-768
: Markdown heading only.
851-851
: Markdown heading only.
955-955
: Markdown heading only.
973-973
: Markdown heading only.
1079-1079
: Markdown heading only.
1127-1127
: Markdown heading only.
1195-1195
: Markdown heading only.
1353-1353
: Markdown heading only.
1383-1383
: Markdown heading only.
1409-1409
: Minor string-format code.
No issues evident.
1470-1470
: Minor code snippet handling NaN.
No issues evident.
1970-1970
: Markdown heading only.
2016-2016
: Markdown heading only.
2073-2073
: Markdown heading only.
2225-2225
: Markdown heading only.
2293-2293
: Minor code snippet handling NaN.
No issues evident.
2686-2686
: Markdown heading only.
2731-2731
: Code comment only.
2749-2749
: Code comment only.
2771-2771
: Markdown heading only.
2822-2822
: Markdown heading only.
2962-2962
: Markdown heading only.
2988-2988
: Code comment only.
3010-3010
: Markdown heading only.
3030-3030
: Code comment only.
3072-3072
: Code comment only.
3672-3672
: Markdown heading only.
3722-3722
: Markdown heading only.
3777-3777
: Markdown heading only.
3960-3960
: Markdown heading only.
3999-3999
: Minor code snippet handling NaN.
No issues evident.
4371-4371
: Markdown heading only.
4429-4429
: Markdown heading only.
4787-4787
: Markdown heading only.
4865-4865
: Markdown heading only.
4944-4944
: Markdown heading only.
5539-5547
: Markdown headings only.
5629-5629
: Markdown heading only.
5690-5690
: Markdown heading only.
5735-5735
: Markdown heading only.
5772-5772
: Check sort criteria.
.str.reverse()
might produce reversed alphabetical order. Verify this is intentional.
5796-5796
: Markdown heading only.
5832-5832
: Markdown heading only.
5914-5914
: Minor code snippet handling NaN.
No issues evident.
6104-6104
: Markdown heading only.
6179-6179
: Markdown heading only.
6259-6259
: Markdown heading only.
6561-6561
: Markdown heading only.
6664-6664
: Markdown heading only.
6766-6766
: Markdown heading only.
6868-6868
: Markdown heading only.
7003-7241
: Review SQL usage & seat multiplier.
• Verify no user input is concatenated into the query to avoid SQL injection.
• You sumdistance * seats
; confirm thatseats=0
or1
is handled correctly to avoid under/overcounting.
7423-7423
: Markdown heading only.
7687-7687
: Markdown heading only.
7767-7767
: Markdown heading only.
7788-7788
: Markdown heading only.
7813-7813
: Markdown heading only.
7957-7957
: Markdown heading only.
8019-8019
: Markdown heading only.
8096-8096
: Minor code snippet.
No issues evident.
8196-8196
: Markdown heading only.
8373-8373
: Plot layout snippet only.
No issues evident.
8382-8382
: Markdown heading only.
8471-8471
: Markdown heading only.
8497-8501
: Styling for discrete color map.
No concerns.
8692-8692
: Markdown heading only.
8716-8716
: Markdown heading only.
8781-8781
: Markdown heading only.
8832-8832
: Markdown heading only.
8963-8963
: Markdown heading only.
9035-9035
: Markdown heading only.
9130-9130
: Markdown heading only.
9199-9199
: Markdown heading only.
9269-9269
: Markdown heading only.
9979-9979
: Markdown heading only.
10100-10100
: Markdown heading only.
10303-10303
: Markdown heading only.
10547-10547
: Markdown heading only.
10649-10649
: Markdown heading only.
10839-10839
: Notebook metadata only.dbt/models/fraud/trips/simultaneous_group_trips.sql (1)
1-23
: LGTM! Well-structured incremental model configuration.The configuration is well-designed with:
- Appropriate incremental materialization for time-series data
- Composite unique key to prevent duplicates
- Matching index for performance optimization
dbt/.sqlfluff (1)
25-27
: LGTM! Consistent indentation settings.The indentation configuration with 2 spaces aligns with common SQL formatting practices.
notebooks/.pre-commit-config.yaml (1)
7-7
: LGTM! Appropriate hook timing.Moving the notebook output clearing to pre-commit stage is better as it ensures clean notebooks before the commit is created.
Ajout de nouveaux modèles de fraude permettant de détecter un pattern de groupe de trajets suspicieux.
Réorganisation du dossier de modèles
fraud
pour une plus grande clartée.Summary by CodeRabbit
New Features
Refactor
Bug Fixes