Skip to content

Commit

Permalink
Publish package
Browse files Browse the repository at this point in the history
  • Loading branch information
Siddharth Saxena committed Mar 19, 2023
0 parents commit 77161db
Show file tree
Hide file tree
Showing 98 changed files with 171,913 additions and 0 deletions.
47 changes: 47 additions & 0 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries

# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.

name: PyPI Publish

on:
push:
branches-ignore:
- 'develop'
- 'feature/*'
tags:
- v**

permissions:
contents: read

jobs:
deploy:

runs-on: ubuntu-18.04

steps:
- uses: actions/checkout@main
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build setuptools wheel
- name: Build package
run: >-
python
setup.py
bdist_wheel
sdist
- name: Publish package
uses: pypa/gh-action-pypi-publish@main
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
127 changes: 127 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# api_key_detector
Neural Network Based, Automatic API Key Detector

> Module forked from https://github.com/alessandrodd/api_key_detector to publish as a pip package
A Multilayer-Perceptron-based system, able to identify API Key strings with an accuracy of over 99%.

For technical details, [check out my thesis (_Automatic extraction of API Keys from Android applications_) and, in particular, **Chapter 3** of the work.](https://goo.gl/uryZeA)

## Requirements

- Python 3.5+
- Modules in requirements.txt (use pip3 to install)
```
pip install -r requirements.txt
```

## Installation

```bash
$ git clone https://github.com/alessandrodd/api_key_detector.git
$ pip3 install -r api_key_detector/requirements.txt
$ python3 -m api_key_detector
```

## Example Library Usage

```python
>>> from api_key_detector import detector
>>> test = ["justsomething", "reallynothingimportant", "AizaSyDtEV5rwG_F1jvyj6WVlOOzD2vZa8DEpLE","eqwioqweioqiwoe"]
>>> detector.detect_api_keys(test)
[False, False, True, False]
>>> detector.filter_api_keys(test)
['AizaSyDtEV5rwG_F1jvyj6WVlOOzD2vZa8DEpLE']
```

## Commandline Usage

A commandline interface can be used to test the library functionalities

```bash
usage: api_key_detector [-h] [--debug] [--test] [--entropy] [--sequentiality]
[--gibberish] [--charset-length] [--words-percentage]
[--string STRING | -a | -e | -s | -g]
[--generate-training-set] [--plot-training-set]
[--api-key-files API_KEY_FILES [API_KEY_FILES ...]]
[--generic-text-files GENERIC_TEXT_FILES [GENERIC_TEXT_FILES ...]]
[--output-file DUMP_FILE] [--filter-apikeys]
[--detect-apikeys]

A python program that detects API Keys

optional arguments:
-h, --help show this help message and exit
--debug Print debug information
--test Test mode; calculates all features for strings in
stdin
--entropy Calculates the charset-normalized Shannon Entropy for
strings in stdin
--sequentiality Calculates the Sequentiality Index for strings in
stdin
--gibberish Calculates the Gibberish Index for strings in stdin
--charset-length Calculates the Induced Charset Length for strings in
stdin
--words-percentage Calculates the percentage of dictionary words for each
string in stdin
--string STRING Calculate all features for a single string, to be used
in conjunction with --test.
-a Sort in alphabetical order, ascending
-e Sort by entropy, ascending
-s Sort by sequentiality, ascending
-g Sort by gibberish index, ascending

--generate-training-set
Generate training set for string classifier. Needs
--api-key-files, --generic-text-files and --output-
file to be specified
--plot-training-set Generate a 3d scatterplot for the training set. Needs
--api-key-files, --generic-text-files and --output-
file to be specified
--api-key-files API_KEY_FILES [API_KEY_FILES ...]
List of files containing valid api-key examples, one
for each line
--generic-text-files GENERIC_TEXT_FILES [GENERIC_TEXT_FILES ...]
List of files containing generic text examples that
DOESN'T contain any Api Key
--output-file DUMP_FILE
Where to output the training set file
--filter-apikeys Filter potential apikeys from strings in stdin.
--detect-apikeys Detect potential apikeys from strings in stdin.
```
## Config File Explained
### config.json
**dump** => Where to save the trained Neural Network. Delete it to retrain the algorithm
**min_key_length** => The minimum length of an API Key
**blacklists** => Txt files containing strings (one for each line) that should never be considered as API Keys
**wordlists** => Txt files containing real words (one for each line), used to detect words inside strings
**word_content_threshold** => If a potential API Key string is made of a fraction of word_content_threshold real words, the API Key is discarded
**api_learnsets** => Txt files containing API Keys (one for each line), used to train the Neural Network
**text_learnsets** => Txt files containing generic strings (no API Keys, one for each line), used to train the Neural Network
**good_test** => Same as api_learnsets, but used to test the Neural Network
**bad_test** => Same as text_learnsets, but used to test the Neural Network
**re_train** => If true, than the Neural Network gets re-trained during initialization
**logging** => Used to config logging capabilities, see [here](https://docs.python.org/3/howto/logging.html)
### gibberish_detector/config.json
**dump** => Where to save the trained algorithm. Delete it to retrain the algorithm
**learnsets** => Txt files containing language-specific text (e.g. books), used to calculate the transition probabilities from each letter to another
**good_test** => Txt files containing syntactically correct sentences, used to test the algorithm
**bad_test** => Txt files containing gibberish, used to test the algorithm
**re_train** => If true, than the transition probabilities gets re-computed during initialization
136 changes: 136 additions & 0 deletions api_key_detector.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
Metadata-Version: 2.1
Name: api-key-detector
Version: 0.0.1
Summary: ML Based API Key Detector
Home-page: https://github.com/s1dds/api_key_detector
Author: Siddharth Saxena
Author-email: [email protected]
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown

# api_key_detector
Neural Network Based, Automatic API Key Detector

A Multilayer-Perceptron-based system, able to identify API Key strings with an accuracy of over 99%.

For technical details, [check out my thesis (_Automatic extraction of API Keys from Android applications_) and, in particular, **Chapter 3** of the work.](https://goo.gl/uryZeA)

## Requirements

- Python 3.5+
- Modules in requirements.txt (use pip3 to install)
```
pip install -r requirements.txt
```

## Installation

```bash
$ git clone https://github.com/alessandrodd/api_key_detector.git
$ pip3 install -r api_key_detector/requirements.txt
$ python3 -m api_key_detector
```

## Example Library Usage

```python
>>> from api_key_detector import detector
>>> test = ["justsomething", "reallynothingimportant", "AizaSyDtEV5rwG_F1jvyj6WVlOOzD2vZa8DEpLE","eqwioqweioqiwoe"]
>>> detector.detect_api_keys(test)
[False, False, True, False]
>>> detector.filter_api_keys(test)
['AizaSyDtEV5rwG_F1jvyj6WVlOOzD2vZa8DEpLE']
```

## Commandline Usage

A commandline interface can be used to test the library functionalities

```bash
usage: api_key_detector [-h] [--debug] [--test] [--entropy] [--sequentiality]
[--gibberish] [--charset-length] [--words-percentage]
[--string STRING | -a | -e | -s | -g]
[--generate-training-set] [--plot-training-set]
[--api-key-files API_KEY_FILES [API_KEY_FILES ...]]
[--generic-text-files GENERIC_TEXT_FILES [GENERIC_TEXT_FILES ...]]
[--output-file DUMP_FILE] [--filter-apikeys]
[--detect-apikeys]

A python program that detects API Keys

optional arguments:
-h, --help show this help message and exit
--debug Print debug information
--test Test mode; calculates all features for strings in
stdin
--entropy Calculates the charset-normalized Shannon Entropy for
strings in stdin
--sequentiality Calculates the Sequentiality Index for strings in
stdin
--gibberish Calculates the Gibberish Index for strings in stdin
--charset-length Calculates the Induced Charset Length for strings in
stdin
--words-percentage Calculates the percentage of dictionary words for each
string in stdin
--string STRING Calculate all features for a single string, to be used
in conjunction with --test.
-a Sort in alphabetical order, ascending
-e Sort by entropy, ascending
-s Sort by sequentiality, ascending
-g Sort by gibberish index, ascending

--generate-training-set
Generate training set for string classifier. Needs
--api-key-files, --generic-text-files and --output-
file to be specified
--plot-training-set Generate a 3d scatterplot for the training set. Needs
--api-key-files, --generic-text-files and --output-
file to be specified
--api-key-files API_KEY_FILES [API_KEY_FILES ...]
List of files containing valid api-key examples, one
for each line
--generic-text-files GENERIC_TEXT_FILES [GENERIC_TEXT_FILES ...]
List of files containing generic text examples that
DOESN'T contain any Api Key
--output-file DUMP_FILE
Where to output the training set file

--filter-apikeys Filter potential apikeys from strings in stdin.
--detect-apikeys Detect potential apikeys from strings in stdin.
```

## Config File Explained
### config.json
**dump** => Where to save the trained Neural Network. Delete it to retrain the algorithm

**min_key_length** => The minimum length of an API Key

**blacklists** => Txt files containing strings (one for each line) that should never be considered as API Keys

**wordlists** => Txt files containing real words (one for each line), used to detect words inside strings

**word_content_threshold** => If a potential API Key string is made of a fraction of word_content_threshold real words, the API Key is discarded

**api_learnsets** => Txt files containing API Keys (one for each line), used to train the Neural Network

**text_learnsets** => Txt files containing generic strings (no API Keys, one for each line), used to train the Neural Network

**good_test** => Same as api_learnsets, but used to test the Neural Network

**bad_test** => Same as text_learnsets, but used to test the Neural Network

**re_train** => If true, than the Neural Network gets re-trained during initialization

**logging** => Used to config logging capabilities, see [here](https://docs.python.org/3/howto/logging.html)

### gibberish_detector/config.json
**dump** => Where to save the trained algorithm. Delete it to retrain the algorithm

**learnsets** => Txt files containing language-specific text (e.g. books), used to calculate the transition probabilities from each letter to another

**good_test** => Txt files containing syntactically correct sentences, used to test the algorithm

**bad_test** => Txt files containing gibberish, used to test the algorithm

**re_train** => If true, than the transition probabilities gets re-computed during initialization
24 changes: 24 additions & 0 deletions api_key_detector.egg-info/SOURCES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
README.md
setup.py
api_key_detector/__init__.py
api_key_detector/__main__.py
api_key_detector/charset.py
api_key_detector/classifier_singleton.py
api_key_detector/classifiers_test.py
api_key_detector/config.py
api_key_detector/dataset_plotter.py
api_key_detector/detector.py
api_key_detector/detector_config.py
api_key_detector/entropy.py
api_key_detector/neuralnets_test.py
api_key_detector/sequentiality.py
api_key_detector/string_classifier.py
api_key_detector/strings_filter.py
api_key_detector/strings_filter_singleton.py
api_key_detector/words_finder.py
api_key_detector/words_finder_singleton.py
api_key_detector.egg-info/PKG-INFO
api_key_detector.egg-info/SOURCES.txt
api_key_detector.egg-info/dependency_links.txt
api_key_detector.egg-info/requires.txt
api_key_detector.egg-info/top_level.txt
1 change: 1 addition & 0 deletions api_key_detector.egg-info/dependency_links.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

7 changes: 7 additions & 0 deletions api_key_detector.egg-info/requires.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
matplotlib==3.6.3
numpy==1.24.2
plotly==5.13.0
PyYAML
rstr==3.2.0
scikit-learn==1.2.1
scipy==1.10.0
1 change: 1 addition & 0 deletions api_key_detector.egg-info/top_level.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
api_key_detector
Loading

0 comments on commit 77161db

Please sign in to comment.