코드 주석 자동 생성 품질 개선을 위한 AST 순회 정보 활용에 관한 연구

한국정보과학회 학술발표논문집, KCC 2022
paper

1. Project Structure

config
data_utils
dataset

emse-data

folder	data
model/hybrid	checkpoints, data, eval, config.yaml, default.yaml, log.txt
train	train_ast.json, train.token.ast, train.token.code, train.token.nl
test	test_ast.json, test.token.ast, test.token.code, test.token.nl
valid	valid_ast.json, valid.token.ast, valid.token.code, valid.token.nl
vocab...	vocab.ast, vocab.code, vocab.nl

Training/Test/Valid Data

code

public int entrySize(Object key, Object value) throws lllegalArgumentException{
	if (value==Token . TOMBSTONE) {
		return NUM_;}
	int size = HeapLRUCapacityController .this .getPerEntryOverhead();
	size += sizeof (key);
	size += sizeof(value);
	return size;
}

>>====================================================================================================================================================================================================================================================================================
public int entrySize ( Object key , Object value ) throws IllegalArgumentException { if ( value == Token . TOMBSTONE ) { return NUM_ ; } int size = HeapLRUCapacityController . this . getPerEntryOverhead ( ) ; size += sizeof ( key ) ; size += sizeof ( value ) ; return size ; }
>>====================================================================================================================================================================================================================================================================================

ast.json

[{"id": 0, "type": "MethodDeclaration", "children": [1, 2, 4, 6, 13, 18, 23, 28], "value": "entrySize"}, {"id": 1, "type": "BasicType", "value": "int"}, {"id": 2, "type": "FormalParameter", "children": [3], "value": "key"}, {"id": 3, "type": "ReferenceType", "value": "Object"}, {"id": 4, "type": "FormalParameter", "children": [5], "value": "value"}, {"id": 5, "type": "ReferenceType", "value": "Object"}, {"id": 6, "type": "IfStatement", "children": [7, 10]}, {"id": 7, "type": "BinaryOperation", "children": [8, 9]}, {"id": 8, "type": "MemberReference", "value": "value"}, {"id": 9, "type": "MemberReference", "value": "Token.TOMBSTONE"}, {"id": 10, "type": "BlockStatement", "children": [11], "value": "None"}, {"id": 11, "type": "ReturnStatement", "children": [12], "value": "return"}, {"id": 12, "type": "MemberReference", "value": "NUM_"}, {"id": 13, "type": "LocalVariableDeclaration", "children": [14, 15], "value": "int"}, {"id": 14, "type": "BasicType", "value": "int"}, {"id": 15, "type": "VariableDeclarator", "children": [16], "value": "size"}, {"id": 16, "type": "This", "children": [17], "value": "HeapLRUCapacityController.this.getPerEntryOverhead"}, {"id": 17, "type": "MethodInvocation", "value": "."}, {"id": 18, "type": "StatementExpression", "children": [19]}, {"id": 19, "type": "Assignment", "children": [20, 21]}, {"id": 20, "type": "MemberReference", "value": "size"}, {"id": 21, "type": "MethodInvocation", "children": [22], "value": "sizeof"}, {"id": 22, "type": "MemberReference", "value": "key"}, {"id": 23, "type": "StatementExpression", "children": [24]}, {"id": 24, "type": "Assignment", "children": [25, 26]}, {"id": 25, "type": "MemberReference", "value": "size"}, {"id": 26, "type": "MethodInvocation", "children": [27], "value": "sizeof"}, {"id": 27, "type": "MemberReference", "value": "value"}, {"id": 28, "type": "ReturnStatement", "children": [29], "value": "return"}, {"id": 29, "type": "MemberReference", "value": "size"}]

token.ast

( MethodDeclaration ( BasicType ) BasicType ( FormalParameter ( ReferenceType ) ReferenceType ) FormalParameter ( FormalParameter ( ReferenceType ) ReferenceType ) FormalParameter ( IfStatement ( BinaryOperation ( MemberReference ) MemberReference ( MemberReference ) MemberReference ) BinaryOperation ( BlockStatement ( ReturnStatement ( MemberReference ) MemberReference ) ReturnStatement ) BlockStatement ) IfStatement ( LocalVariableDeclaration ( BasicType ) BasicType ( VariableDeclarator ( This ( MethodInvocation ) MethodInvocation ) This ) VariableDeclarator ) LocalVariableDeclaration ( StatementExpression ( Assignment ( MemberReference ) MemberReference ( MethodInvocation ( MemberReference ) MemberReference ) MethodInvocation ) Assignment ) StatementExpression ( StatementExpression ( Assignment ( MemberReference ) MemberReference ( MethodInvocation ( MemberReference ) MemberReference ) MethodInvocation ) Assignment ) StatementExpression ( ReturnStatement ( MemberReference ) MemberReference ) ReturnStatement ) MethodDeclaration

token.nl

"as far as we re concerned all entries have the same size"

img
projects
scripts
source code
- __ main __ .py
- beam_search.py
- evaluations.py
- models.py
- rnn.py
- seq2seq_model.py
- translation_model.py
- utils.py

2. Execute File

🌼 Model Training

python3 main.py config.yaml --train -v
config.yaml 에서 hyper parameter 수정 가능
적절하게 수행

parser.add_argument('config', help='load a configuration file in the YAML format')
parser.add_argument('-v', '--verbose', action='store_true', help='verbose mode')
parser.add_argument('--debug', action='store_true', help='debug mode')

# using 'store_const' instead of 'store_true' so that the default value is `None` instead of `False`
parser.add_argument('--reset', action='store_const', const=True, help="reset model (don't load any checkpoint)")
parser.add_argument('--reset-learning-rate', action='store_const', const=True, help='reset learning rate')
parser.add_argument('--learning-rate', type=float, help='custom learning rate (triggers `reset-learning-rate`)')
parser.add_argument('--purge', action='store_true', help='remove previous model files')

🌼 Code Comment Generation Test

python3 main.py config.yaml --decode “data_dir”
data_dir : ast.json, token.ast, token.code, token.nl 있는 폴더

🌼 Automatic Comment Evaluation

python3 main.py config.yaml --eval “data_dir”
data_dir : ast.json, token.ast, token.code, token.nl 있는 폴더

🌼 Generate ASTs for Java method

python3 get_ast.py source.code ast.json

// code
public boolean doesNotHaveIds (){ 
  return getIds () == null || getIds ().getIds().isEmpty(); 
}

// AST
[
{"id": 0, "type": "MethodDeclaration", "children": [1, 2], "value": "doesNotHaveIds"}, 
    {"id": 1, "type": "BasicType", "value": "boolean"}, 
    {"id": 2, "type": "ReturnStatement", "children": [3], "value": "return"}, 
        {"id": 3, "type": "BinaryOperation", "children": [4, 7]}, 
            {"id": 4, "type": "BinaryOperation", "children": [5, 6]}, 
                {"id": 5, "type": "MethodInvocation", "value": "getIds"}, 
                {"id": 6, "type": "Literal", "value": "null"}, 
            {"id": 7, "type": "MethodInvocation", "children": [8, 9], "value": "getIds"}, 
                {"id": 8, "type": "MethodInvocation", "value": "."}, 
                {"id": 9, "type": "MethodInvocation", "value": "."}
 ]

🌼 AST_Traversal(Generate SBT)

python3 ast_traversal.py
def get_sbt_structrue

// SBT
( MethodDeclaration ( BasicType ) BasicType ( FormalParameter ( ReferenceType ) ReferenceType ) FormalParameter ( FormalParameter ( ReferenceType ) ReferenceType ) FormalParameter ( IfStatement ( BinaryOperation ( MemberReference ) MemberReference ( MemberReference ) MemberReference ) BinaryOperation ( BlockStatement ( ReturnStatement ( MemberReference ) MemberReference ) ReturnStatement ) BlockStatement ) IfStatement ( LocalVariableDeclaration ( BasicType ) BasicType ( VariableDeclarator ( This ( MethodInvocation ) MethodInvocation ) This ) VariableDeclarator ) LocalVariableDeclaration ( StatementExpression ( Assignment ( MemberReference ) MemberReference ( MethodInvocation ( MemberReference ) MemberReference ) MethodInvocation ) Assignment ) StatementExpression ( StatementExpression ( Assignment ( MemberReference ) MemberReference ( MethodInvocation ( MemberReference ) MemberReference ) MethodInvocation ) Assignment ) StatementExpression ( ReturnStatement ( MemberReference ) MemberReference ) ReturnStatement ) MethodDeclaration

🌼 AST_Traversal(Generate SBT with Code)

python3 ast_traversal.py
def get_sbtcode_structure

// SBT + CODE
( MethodDeclaration entrySize ( BasicType int ) BasicType ( FormalParameter key ( ReferenceType Object ) ReferenceType ) FormalParameter ( FormalParameter value ( ReferenceType Object ) ReferenceType ) FormalParameter ( IfStatement if ( BinaryOperation ( MemberReference value ) MemberReference ( MemberReference Token.TOMBSTONE ) MemberReference ) BinaryOperation ( BlockStatement { ( ReturnStatement return ( MemberReference NUM_ ) MemberReference ) ReturnStatement ) BlockStatement ) IfStatement ( LocalVariableDeclaration int ( BasicType int ) BasicType ( VariableDeclarator size ( This HeapLRUCapacityController.this.getPerEntryOverhead ( MethodInvocation . ) MethodInvocation ) This ) VariableDeclarator ) LocalVariableDeclaration ( StatementExpression size ( Assignment ( MemberReference size ) MemberReference ( MethodInvocation sizeof ( MemberReference key ) MemberReference ) MethodInvocation ) Assignment ) StatementExpression ( StatementExpression size ( Assignment ( MemberReference size ) MemberReference ( MethodInvocation sizeof ( MemberReference value ) MemberReference ) MethodInvocation ) Assignment ) StatementExpression ( ReturnStatement return ( MemberReference size ) MemberReference ) ReturnStatement ) MethodDeclaration

3. Our Experiments

🦑 DeepCom(default)

config_default.yaml

# SGD parameters
learning_rate: 0.5        
sgd_learning_rate: 1.0      
learning_rate_decay_factor: 0.99  

# training parameters
max_gradient_norm: 5.0   
steps_per_checkpoint: 2000   
steps_per_eval: 2000    
eval_burn_in: 0          
max_steps: 0             
max_epochs: 50            
keep_best: 5             
feed_previous: 0.0       
optimizer: sgd       
moving_average: null

# batch iteration parameters
batch_size: 100         
batch_mode: random   
shuffle: True       
read_ahead: 1      
reverse_input: True

# model (each one of these settings can be defined specifically in 'encoders' and 'decoders', or generally here)
cell_size: 512          
embedding_size: 512     
attn_size: 256           
layers: 1                
cell_type: LSTM          
character_level: False   
truncate_lines: True

# data
max_train_size: 0        
max_dev_size: 0          
max_test_size: 0         
data_dir: ../emse-data(ast_only)
model_dir: ../emse-data(ast_only)/model/default
train_prefix: train      
script_dir: scripts      
dev_prefix: test        
vocab_prefix: vocab      
checkpoints: []

# decoding
score_function: nltk_sentence_bleu
post_process_script: null 
remove_unk: False        
beam_size: 1

# general
**encoders:                
  - name: ast            
    max_len: 500         
    attention_type: global

decoders:                
  - name: nl            
    max_len: 30**

성능 확인

# 우리가 테스트 한 것
step 222000 **epoch 50** learning rate 0.306 step-time 0.791 loss 10.135
test eval: loss 36.39
starting decoding
test avg_score=0.2019(**BLEU**)

test 폴더 - avg_score: 0.2026

# 논문 결과
BLEU: 38.17

🦑 Hybrid - DeepCom(default)

config.yaml

# SGD parameters
learning_rate: 0.5        # initial learning rate
sgd_learning_rate: 1.0      # SGD can start at a different learning rate (useful for switching between Adam and SGD)
learning_rate_decay_factor: 0.99  # decay the learning rate by this factor at a given frequency

# training parameters
max_gradient_norm: 5.0   # clip gradients to this norm (prevents exploding gradient)
steps_per_checkpoint: 2000   # number of SGD updates between each checkpoint
steps_per_eval: 2000    # number of SGD updates between each BLEU eval (on dev set)
eval_burn_in: 0          # minimum number of updates before starting BLEU eval
max_steps: 0             # maximum number of updates before stopping, 600000->0
max_epochs: 50            # maximum number of epochs before stopping, 100->50
keep_best: 5             # number of best checkpoints to keep (based on BLEU score on dev set)
feed_previous: 0.0       # randomly feed prev output instead of ground truth to decoder during training ([0,1] proba)
optimizer: sgd          # which training algorithm to use ('sgd', 'adadelta', or 'adam')
moving_average: null     # TODO

# batch iteration parameters
batch_size: 128           # batch size (during training and greedy decoding), 64->128
batch_mode: standard     # standard (cycle through train set) or random (sample from train set)
shuffle: True            # shuffle dataset at each new epoch
read_ahead: 1           # number of batches to read ahead and sort by sequence length (can speed up training)
reverse_input: True     # reverse input sequences

# model (each one of these settings can be defined specifically in 'encoders' and 'decoders', or generally here)
cell_size: 256          # size of the RNN cells
embedding_size: 256     # size of the embeddings
attn_size: 128           # size of the attention layer
layers: 1                # number of RNN layers per encoder and decoder
cell_type: GRU          # LSTM, GRU, DropoutGRU
character_level: False   # character-level sequences
truncate_lines: True     # if True truncate lines which are too long, otherwise just drop them

# encoder settings
bidir: False              # use bidirectional encoders
train_initial_states: True  # whether the initial states of the encoder should be trainable parameters
bidir_projection: False  # project bidirectional encoder states to cell_size (or just keep the concatenation)
time_pooling: null       # perform time pooling (skip states) between the layers of the encoder (list of layers - 1 ratios)
pooling_avg: True        # average or skip consecutive states
binary: False            # use binary input for the encoder (no vocab and no embeddings, see utils.read_binary_features)
attn_filters: 0
attn_filter_length: 0
input_layers: null       # list of fully connected layer sizes, applied before the encoder
attn_temperature: 1.0    # 1.0: true softmax (low values: uniform distribution, high values: argmax)
final_state: last        # last (default), concat_last, average
highway_layers: 0        # number of highway layers before the encoder (after convolutions and maxout)

# decoder settings
tie_embeddings: False     # use transpose of the embedding matrix for output projection (requires 'output_extra_proj')
use_previous_word: True   # use previous word when predicting a new word
attn_prev_word: False     # use the previous word in the attention model
softmax_temperature: 1.0  # TODO: temperature of the output softmax
pred_edits: False         # output is a sequence of edits, apply those edits before decoding/evaluating
conditional_rnn: False    # two-layer decoder, where the 1st layer is used for attention, and the 2nd layer for prediction
generate_first: True      # generate next word before updating state (look->generate->update)
update_first: False       # update state before looking and generating next word
rnn_feed_attn: True       # feed attention context to the RNN's transition fonction
use_lstm_full_state: False # use LSTM's full state for attention and next word prediction
pred_embed_proj: True     # project decoder output to embedding size before projecting to vocab size
pred_deep_layer: False    # add a non-linear transformation just before softmax
pred_maxout_layer: True   # use a maxout layer just before the vocabulary projection and softmax
aggregation_method: sum # how to combine the attention contexts of multiple encoders (concat, sum)

# data
max_train_size: 0        # maximum size of the training data (0 for unlimited)
max_dev_size: 0          # maximum size of the dev data
max_test_size: 0         # maximum size of the test data
data_dir: ../emse-data(original)           # directory containing the training data
model_dir: ../emse-data(original)/model/hybrid
train_prefix: train      # name of the training corpus
script_dir: scripts      # directory where the scripts are kepts (in particular the scoring scripts)
dev_prefix: test        # names of the development corpora
vocab_prefix: vocab      # name of the vocabulary files
checkpoints: []          # list of checkpoints to load (in this specific order) after main checkpoint

# decoding
score_function: nltk_sentence_bleu # name of the main scoring function, inside 'evaluation.py' (used for selecting models)
post_process_script: null # path to post-processing script (called before evaluating)
remove_unk: False        # remove UNK symbols from the decoder output
beam_size: 5             # beam size for decoding (decoder is greedy by default)
ensemble: False          # use an ensemble of models while decoding (specified by the --checkpoints parameter)
output: null             # output file for decoding (writes to standard output by default)
len_normalization: 1.0   # length normalization coefficient used in beam-search decoder
early_stopping: True     # reduce beam-size each time a finished hypothesis is encountered (affects decoding speed)
raw_output: False        # output translation hypotheses without any post-processing
average: False           # like ensemble, but instead of averaging the log-probs, average all parameters
pred_edits: False

# general
**encoders:                # this is a list (you can specify several encoders)
  - name: code             # each encoder or decoder has a name (used for naming variables) and an extension (for files)
    max_len: 200          # max_len of api
    attention_type: global
  - name: ast
    max_len: 500
    attention_type: global

decoders:                # Each encoder or decoder can redefine its own values for a number of parameters,
  - name: nl             # including `cell_size`, `embedding_size` and `attn_size`
    max_len: 30**

성능 확인

# 우리가 테스트 한 것
step 174000 **epoch 50** learning rate 0.306 step-time 0.951 loss 8.083
test eval: loss 33.39
test avg_score=0.3806(**BLEU**)

test 폴더 - avg_score:0.3820

🦑 DeepCom(our model)

config_park.yaml

# 위 항목 deepcom(default)와 동일
# data
max_train_size: 0        # maximum size of the training data (0 for unlimited)
max_dev_size: 0          # maximum size of the dev data
max_test_size: 0         # maximum size of the test data
data_dir: ../emse-data(sbt_code)           # directory containing the training data
model_dir: ../emse-data(sbt_code)/model/default
train_prefix: train      # name of the training corpus
script_dir: scripts      # directory where the scripts are kepts (in particular the scoring scripts)
dev_prefix: test        # names of the development corpora
vocab_prefix: vocab      # name of the vocabulary files
checkpoints: []          # list of checkpoints to load (in this specific order) after main checkpoint

# general
**encoders:                # this is a list (you can specify several encoders)
  - name: ast             # each encoder or decoder has a name (used for naming variables) and an extension (for files) -> 'code'
    max_len: 500          # max_len of api -> '200'
    attention_type: global

decoders:                # Each encoder or decoder can redefine its own values for a number of parameters,
  - name: nl             # including `cell_size`, `embedding_size` and `attn_size`
    max_len: 30**

성능 확인

# 우리가 테스트 한 것
step 222000 epoch 50 learning rate 0.306 step-time 1.031 loss 10.060
test eval: loss 36.92
starting decoding
test avg_score=0.1629**(BLUE)**

test 폴더 - avg_score: 0.1626

🦑 Hybrid - DeepCom(our model)

config_hybrid_park.yaml

# 위 항목 hybrid-deepcom(default)와 동일
# data
max_train_size: 0        # maximum size of the training data (0 for unlimited)
max_dev_size: 0          # maximum size of the dev data
max_test_size: 0         # maximum size of the test data
data_dir: ../emse-data(sbt_code_hb)           # directory containing the training data
model_dir: ../emse-data(sbt_code_hb)/model/hybrid
train_prefix: train      # name of the training corpus
script_dir: scripts      # directory where the scripts are kepts (in particular the scoring scripts)
dev_prefix: test        # names of the development corpora
vocab_prefix: vocab      # name of the vocabulary files
checkpoints: []          # list of checkpoints to load (in this specific order) after main checkpoint

# decoding
score_function: nltk_sentence_bleu # name of the main scoring function, inside 'evaluation.py' (used for selecting models)
post_process_script: null # path to post-processing script (called before evaluating)
remove_unk: False        # remove UNK symbols from the decoder output
beam_size: 5             # beam size for decoding (decoder is greedy by default)
ensemble: False          # use an ensemble of models while decoding (specified by the --checkpoints parameter)
output: null             # output file for decoding (writes to standard output by default)
len_normalization: 1.0   # length normalization coefficient used in beam-search decoder
early_stopping: True     # reduce beam-size each time a finished hypothesis is encountered (affects decoding speed)
raw_output: False        # output translation hypotheses without any post-processing
average: False           # like ensemble, but instead of averaging the log-probs, average all parameters
pred_edits: False

# general
**encoders:                # this is a list (you can specify several encoders)
  - name: code             # each encoder or decoder has a name (used for naming variables) and an extension (for files)
    max_len: 200          # max_len of api
    attention_type: global
  - name: ast
    max_len: 500
    attention_type: global

decoders:                # Each encoder or decoder can redefine its own values for a number of parameters,
  - name: nl             # including `cell_size`, `embedding_size` and `attn_size`
    max_len: 30**

성능 확인

step 174000 epoch 50 learning rate 0.306 step-time 1.163 loss 8.117
test eval: loss 33.40
starting decoding
test avg_score=0.3788**(BLEU)**

test 폴더 - avg_score: 0.3798

🦑 DeepCom(our model-sim SBT)

config_park_simSBT.yaml

# 위 항목 deepcom(default)와 동일
# data
max_train_size: 0        # maximum size of the training data (0 for unlimited)
max_dev_size: 0          # maximum size of the dev data
max_test_size: 0         # maximum size of the test data
data_dir: ../emse-data(simsbt_code)           # directory containing the training data
model_dir: ../emse-data(simsbt_code)/model/default
train_prefix: train      # name of the training corpus
script_dir: scripts      # directory where the scripts are kepts (in particular the scoring scripts)
dev_prefix: test        # names of the development corpora
vocab_prefix: vocab      # name of the vocabulary files
checkpoints: []          # list of checkpoints to load (in this specific order) after main checkpoint

# general
**encoders:                # this is a list (you can specify several encoders)
  - name: ast             # each encoder or decoder has a name (used for naming variables) and an extension (for files) -> 'code'
    max_len: 500          # max_len of api -> '200'
    attention_type: global

decoders:                # Each encoder or decoder can redefine its own values for a number of parameters,
  - name: nl             # including `cell_size`, `embedding_size` and `attn_size`
    max_len: 30**

성능 확인

step 222000 epoch 50 learning rate 0.306 step-time 0.620 loss 9.092
test eval: loss 36.50
starting decoding
test avg_score=0.2424**(BLEU)**

test 폴더 - avg_score: 0.2417

🦑 Hybrid - DeepCom(our model-sim SBT)

config_hybrid_park_simSBT.yaml

# 위 항목 hybrid-deepcom(default)와 동일
# data
max_train_size: 0        # maximum size of the training data (0 for unlimited)
max_dev_size: 0          # maximum size of the dev data
max_test_size: 0         # maximum size of the test data
data_dir: ../emse-data(simsbt_code_hb)           # directory containing the training data
model_dir: ../emse-data(simsbt_code_hb)/model/hybrid
train_prefix: train      # name of the training corpus
script_dir: scripts      # directory where the scripts are kepts (in particular the scoring scripts)
dev_prefix: test        # names of the development corpora
vocab_prefix: vocab      # name of the vocabulary files
checkpoints: []          # list of checkpoints to load (in this specific order) after main checkpoint

# decoding
score_function: nltk_sentence_bleu # name of the main scoring function, inside 'evaluation.py' (used for selecting models)
post_process_script: null # path to post-processing script (called before evaluating)
remove_unk: False        # remove UNK symbols from the decoder output
beam_size: 5             # beam size for decoding (decoder is greedy by default)
ensemble: False          # use an ensemble of models while decoding (specified by the --checkpoints parameter)
output: null             # output file for decoding (writes to standard output by default)
len_normalization: 1.0   # length normalization coefficient used in beam-search decoder
early_stopping: True     # reduce beam-size each time a finished hypothesis is encountered (affects decoding speed)
raw_output: False        # output translation hypotheses without any post-processing
average: False           # like ensemble, but instead of averaging the log-probs, average all parameters
pred_edits: False

# general
**encoders:                # this is a list (you can specify several encoders)
  - name: code             # each encoder or decoder has a name (used for naming variables) and an extension (for files)
    max_len: 200          # max_len of api
    attention_type: global
  - name: ast
    max_len: 500
    attention_type: global

decoders:                # Each encoder or decoder can redefine its own values for a number of parameters,
  - name: nl             # including `cell_size`, `embedding_size` and `attn_size`
    max_len: 30**

성능 확인

step 174000 epoch 50 learning rate 0.306 step-time 0.756 loss 7.984
test eval: loss 33.54
starting decoding
test avg_score=0.3848**(BLEU)**

test 폴더 - avg_score: 0.3849

🦑 SeCNN(default)

Training Details

data_RQ1 데이터 사용하지 않고 오리지널 데이터 사용해 학습

성능 확인

After 30000 steps, BLEU in test: 0.32178

🦑 SeTransformer(default)

Training Details

data_RQ1 데이터 사용하지 않고 오리지널 데이터 사용해 학습

성능 확인

After 501100 steps
rate is 0.00001  
cost is 0.00151 
In iterator: 229. 
nowCBleu: 0.41729
maxCBlue: 0.41747 
nowSBleu: 0.44359
maxSBlue: 0.44359

🦧 성능 비교 🦧

	Deepcom	Deepcom(sbtcode)	H-Deepcom	H-Deepcom(sbtcode)	DeepCom(our model-sim SBTcode)	H-DeepCom(our model-sim SBTcode)	seCNN(default)	seTransformer
BLEU	0.2026	0.1626	0.3820	0.3798	0.2417	0.3849	0.32178	0.44359
METEOR	0.3172	0.2741	0.5126	0.5105	0.3543	0.5164

Reference

GitHub - xing-hu/EMSE-DeepCom: The dataset for EMSE-DeepCom

https://xin-xia.github.io/publication/emse192.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
config		config
data_utils		data_utils
dataset/10_folds		dataset/10_folds
img		img
projects		projects
results		results
scripts		scripts
source code		source code
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

코드 주석 자동 생성 품질 개선을 위한 AST 순회 정보 활용에 관한 연구

1. Project Structure

Training/Test/Valid Data

2. Execute File

🌼 Model Training

🌼 Code Comment Generation Test

🌼 Automatic Comment Evaluation

🌼 Generate ASTs for Java method

🌼 AST_Traversal(Generate SBT)

🌼 AST_Traversal(Generate SBT with Code)

3. Our Experiments

🦑 DeepCom(default)

🦑 Hybrid - DeepCom(default)

🦑 DeepCom(our model)

🦑 Hybrid - DeepCom(our model)

🦑 DeepCom(our model-sim SBT)

🦑 Hybrid - DeepCom(our model-sim SBT)

🦑 SeCNN(default)

🦑 SeTransformer(default)

🦧 성능 비교 🦧

Reference

About

Releases

Packages

Contributors 2

Languages

License

KIE-KID/CodeSBT

Folders and files

Latest commit

History

Repository files navigation

코드 주석 자동 생성 품질 개선을 위한 AST 순회 정보 활용에 관한 연구

1. Project Structure

Training/Test/Valid Data

2. Execute File

🌼 Model Training

🌼 Code Comment Generation Test

🌼 Automatic Comment Evaluation

🌼 Generate ASTs for Java method

🌼 AST_Traversal(Generate SBT)

🌼 AST_Traversal(Generate SBT with Code)

3. Our Experiments

🦑 DeepCom(default)

🦑 Hybrid - DeepCom(default)

🦑 DeepCom(our model)

🦑 Hybrid - DeepCom(our model)

🦑 DeepCom(our model-sim SBT)

🦑 Hybrid - DeepCom(our model-sim SBT)

🦑 SeCNN(default)

🦑 SeTransformer(default)

🦧 성능 비교 🦧

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages