graykode
Committed by GitHub

Merge pull request #1 from graykode/0.1.0

JavaScript Language is supported!!
...@@ -2,12 +2,15 @@ language: python ...@@ -2,12 +2,15 @@ language: python
2 python: 2 python:
3 - "3.6" 3 - "3.6"
4 4
5 +env:
6 + - LANGUAGE="py"
7 +
5 services: 8 services:
6 - docker 9 - docker
7 10
8 before_install: 11 before_install:
9 - - docker pull graykode/commit-autosuggestions 12 + - docker pull graykode/commit-autosuggestions:${LANGUAGE}
10 - - docker run -it -d -p 5000:5000 --restart always graykode/commit-autosuggestions 13 + - docker run -it -d -p 5000:5000 --restart always graykode/commit-autosuggestions:${LANGUAGE}
11 14
12 # command to install dependencies 15 # command to install dependencies
13 install: 16 install:
......
...@@ -46,20 +46,18 @@ Recommended Commit Message : Remove unused imports ...@@ -46,20 +46,18 @@ Recommended Commit Message : Remove unused imports
46 To solve this problem, use a new embedding called [`patch_type_embeddings`](https://github.com/graykode/commit-autosuggestions/blob/master/commit/model/diff_roberta.py#L40) that can distinguish added and deleted, just as the XLM(Lample et al, 2019) used language embeddeding. (1 for added, 2 for deleted.) 46 To solve this problem, use a new embedding called [`patch_type_embeddings`](https://github.com/graykode/commit-autosuggestions/blob/master/commit/model/diff_roberta.py#L40) that can distinguish added and deleted, just as the XLM(Lample et al, 2019) used language embeddeding. (1 for added, 2 for deleted.)
47 47
48 ### Language support 48 ### Language support
49 -| Language | Added | Diff | 49 +| Language | Added | Diff | Data(Only Diff) | Weights |
50 -| :------------- | :---: | :---:| 50 +| :------------- | :---: | :---:| :---: | :---:|
51 -| Python | ✅ | ✅ | 51 +| Python | ✅ | ✅ | [423k](https://drive.google.com/drive/folders/1_8lQmzTH95Nc-4MKd1RP3x4BVc8tBA6W?usp=sharing) | [Link](https://drive.google.com/drive/folders/1OwM7_FiLiwVJAhAanBPWtPw3Hz3Dszbh?usp=sharing) |
52 -| JavaScript | ⬜ | ⬜ | 52 +| JavaScript | ✅ | ✅ | [514k](https://drive.google.com/drive/folders/1-Hv0VZWSAGqs-ewNT6NhLKEqDH2oa1az?usp=sharing) | [Link](https://drive.google.com/drive/folders/1Jw8vXfxUXsfElga_Gi6e7Uhfc_HlmOuD?usp=sharing) |
53 -| Go | ⬜ | ⬜ | 53 +| Go | ⬜ | ⬜ | ⬜ | ⬜ |
54 -| JAVA | ⬜ | ⬜ | 54 +| JAVA | ⬜ | ⬜ | ⬜ | ⬜ |
55 -| Ruby | ⬜ | ⬜ | 55 +| Ruby | ⬜ | ⬜ | ⬜ | ⬜ |
56 -| PHP | ⬜ | ⬜ | 56 +| PHP | ⬜ | ⬜ | ⬜ | ⬜ |
57 * ✅ — Supported 57 * ✅ — Supported
58 -* 🔶 — Partial support
59 -* 🚧 — Under development
60 * ⬜ - N/A ️ 58 * ⬜ - N/A ️
61 59
62 -We plan to slowly conquer languages that are not currently supported. However, I also need to use expensive GPU instances of AWS or GCP to train about the above languages. Please do a simple sponsor for this! 60 +We plan to slowly conquer languages that are not currently supported. However, I also need to use expensive GPU instances of AWS or GCP to train about the above languages. Please do a simple sponsor for this! Add data is [CodeSearchNet dataset](https://drive.google.com/uc?id=1rd2Tc6oUWBo7JouwexW3ksQ0PaOhUr6h).
63 61
64 ### Quick Start 62 ### Quick Start
65 To run this project, you need a flask-based inference server (GPU) and a client (commit module). If you don't have a GPU, don't worry, you can use it through Google Colab. 63 To run this project, you need a flask-based inference server (GPU) and a client (commit module). If you don't have a GPU, don't worry, you can use it through Google Colab.
...@@ -68,9 +66,18 @@ To run this project, you need a flask-based inference server (GPU) and a client ...@@ -68,9 +66,18 @@ To run this project, you need a flask-based inference server (GPU) and a client
68 Prepare Docker and Nvidia-docker before running the server. 66 Prepare Docker and Nvidia-docker before running the server.
69 67
70 ##### 1-a. If you have GPU machine. 68 ##### 1-a. If you have GPU machine.
71 -Serve flask server with Nvidia Docker 69 +Serve flask server with Nvidia Docker. Check the docker tag for programming language in [here](https://hub.docker.com/repository/registry-1.docker.io/graykode/commit-autosuggestions/tags).
70 +| Language | Tag |
71 +| :------------- | :---: |
72 +| Python | py |
73 +| JavaScript | js |
74 +| Go | go |
75 +| JAVA | java |
76 +| Ruby | ruby |
77 +| PHP | php |
78 +
72 ```shell script 79 ```shell script
73 -$ docker run -it --gpus 0 -p 5000:5000 commit-autosuggestions:0.1-gpu 80 +$ docker run -it -d --gpus 0 -p 5000:5000 graykode/commit-autosuggestions:{language}
74 ``` 81 ```
75 82
76 ##### 1-b. If you don't have GPU machine. 83 ##### 1-b. If you don't have GPU machine.
......
...@@ -146,7 +146,7 @@ def main(args): ...@@ -146,7 +146,7 @@ def main(args):
146 146
147 if __name__ == '__main__': 147 if __name__ == '__main__':
148 parser = argparse.ArgumentParser(description="") 148 parser = argparse.ArgumentParser(description="")
149 - parser.add_argument("--load_model_path", default='weight', type=str, 149 + parser.add_argument("--load_model_path", type=str, required=True,
150 help="Path to trained model: Should contain the .bin files") 150 help="Path to trained model: Should contain the .bin files")
151 151
152 parser.add_argument("--model_type", default='roberta', type=str, 152 parser.add_argument("--model_type", default='roberta', type=str,
......
1 +# Change Log
2 +version : v0.1.0
3 +
4 +## change things
5 +
6 +### Bug Fixes
7 +- Modify the weight path in the Dockerfile.
8 +
9 +### New Features
10 +- JavaScript Language Support.
11 +- Detach multiple settings (Unittest, Dockerfile) for Language support.
12 +
13 +### New Examples
...\ No newline at end of file ...\ No newline at end of file
...@@ -56,8 +56,15 @@ ...@@ -56,8 +56,15 @@
56 "#### Download model weights\n", 56 "#### Download model weights\n",
57 "\n", 57 "\n",
58 "Download the two weights of model from the google drive through the gdown module.\n", 58 "Download the two weights of model from the google drive through the gdown module.\n",
59 - "1. [Added model](https://drive.google.com/uc?id=1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4) : A model trained Code2NL on Python using pre-trained CodeBERT (Feng at al, 2020).\n", 59 + "1. Added model : A model trained Code2NL on Python using pre-trained CodeBERT (Feng at al, 2020).\n",
60 - "2. [Diff model](https://drive.google.com/uc?id=1--gcVVix92_Fp75A-mWH0pJS0ahlni5m) : A model retrained by initializing with the weight of model (1), adding embedding of the added and deleted parts(`patch_ids_embedding`) of the code." 60 + "2. Diff model : A model retrained by initializing with the weight of model (1), adding embedding of the added and deleted parts(`patch_ids_embedding`) of the code.\n",
61 + "\n",
62 + "Download pre-trained weight\n",
63 + "\n",
64 + "Language | Added | Diff\n",
65 + "--- | --- | ---\n",
66 + "python | 1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4 | 1--gcVVix92_Fp75A-mWH0pJS0ahlni5m\n",
67 + "javascript | 1-F68ymKxZ-htCzQ8_Y9iHexs2SJmP5Gc | 1-39rmu-3clwebNURMQGMt-oM4HsAkbsf"
61 ] 68 ]
62 }, 69 },
63 { 70 {
...@@ -66,9 +73,12 @@ ...@@ -66,9 +73,12 @@
66 "id": "P9-EBpxt0Dp0" 73 "id": "P9-EBpxt0Dp0"
67 }, 74 },
68 "source": [ 75 "source": [
76 + "ADD_MODEL='1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4'\n",
77 + "DIFF_MODEL='1--gcVVix92_Fp75A-mWH0pJS0ahlni5m'\n",
78 + "\n",
69 "!pip install gdown \\\n", 79 "!pip install gdown \\\n",
70 - " && gdown \"https://drive.google.com/uc?id=1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4\" -O weight/added/pytorch_model.bin \\\n", 80 + " && gdown \"https://drive.google.com/uc?id=$ADD_MODEL\" -O weight/added/pytorch_model.bin \\\n",
71 - " && gdown \"https://drive.google.com/uc?id=1--gcVVix92_Fp75A-mWH0pJS0ahlni5m\" -O weight/diff/pytorch_model.bin" 81 + " && gdown \"https://drive.google.com/uc?id=$DIFF_MODEL\" -O weight/diff/pytorch_model.bin"
72 ], 82 ],
73 "execution_count": null, 83 "execution_count": null,
74 "outputs": [] 84 "outputs": []
......
1 +FROM nvcr.io/nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
2 +LABEL maintainer="nlkey2022@gmail.com"
3 +
4 +RUN DEBIAN_FRONTEND=noninteractive apt-get -qq update \
5 + && DEBIAN_FRONTEND=noninteractive apt-get -qqy install curl python3-pip git \
6 + && rm -rf /var/lib/apt/lists/*
7 +
8 +ARG PYTORCH_WHEEL="https://download.pytorch.org/whl/cu101/torch-1.6.0%2Bcu101-cp36-cp36m-linux_x86_64.whl"
9 +ARG ADDED_MODEL="1-F68ymKxZ-htCzQ8_Y9iHexs2SJmP5Gc"
10 +ARG DIFF_MODEL="1-39rmu-3clwebNURMQGMt-oM4HsAkbsf"
11 +
12 +RUN git clone https://github.com/graykode/commit-autosuggestions.git /app/commit-autosuggestions \
13 + && cd /app/commit-autosuggestions
14 +
15 +WORKDIR /app/commit-autosuggestions
16 +
17 +RUN pip3 install ${PYTORCH_WHEEL} gdown
18 +RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/javascript/added/
19 +RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/javascript/diff/
20 +
21 +RUN pip3 install -r requirements.txt
22 +
23 +ENTRYPOINT ["python3", "app.py", "--load_model_path", "./weight/javascript/"]
...@@ -10,14 +10,14 @@ ARG ADDED_MODEL="1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4" ...@@ -10,14 +10,14 @@ ARG ADDED_MODEL="1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4"
10 ARG DIFF_MODEL="1--gcVVix92_Fp75A-mWH0pJS0ahlni5m" 10 ARG DIFF_MODEL="1--gcVVix92_Fp75A-mWH0pJS0ahlni5m"
11 11
12 RUN git clone https://github.com/graykode/commit-autosuggestions.git /app/commit-autosuggestions \ 12 RUN git clone https://github.com/graykode/commit-autosuggestions.git /app/commit-autosuggestions \
13 - && cd /app/commit-autosuggestions && python3 setup.py install 13 + && cd /app/commit-autosuggestions
14 14
15 WORKDIR /app/commit-autosuggestions 15 WORKDIR /app/commit-autosuggestions
16 16
17 RUN pip3 install ${PYTORCH_WHEEL} gdown 17 RUN pip3 install ${PYTORCH_WHEEL} gdown
18 -RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/added/ 18 +RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/python/added/
19 -RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/diff/ 19 +RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/python/diff/
20 20
21 RUN pip3 install -r requirements.txt 21 RUN pip3 install -r requirements.txt
22 22
23 -ENTRYPOINT ["python3", "app.py"] 23 +ENTRYPOINT ["python3", "app.py", "--load_model_path", "./weight/python/"]
......
...@@ -104,6 +104,8 @@ optional arguments: ...@@ -104,6 +104,8 @@ optional arguments:
104 The maximum total target sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. 104 The maximum total target sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.
105 ``` 105 ```
106 106
107 +> If `UnicodeDecodeError` occurs while using gitparser.py, you must use the [GitPython](https://github.com/gitpython-developers/GitPython) package at least [this commit](https://github.com/gitpython-developers/GitPython/commit/bfbd5ece215dea328c3c6c4cba31225caa66ae9a).
108 +
107 #### 3. Training Added model(Optional for Python Language). 109 #### 3. Training Added model(Optional for Python Language).
108 Python has learned the Added model. So, if you only want to make a Diff model for the Python language, step 3 can be ignored. However, for other languages (JavaScript, GO, Ruby, PHP and JAVA), [Code2NL training](https://github.com/microsoft/CodeBERT#fine-tune-1) is required to use as the initial weight of the model to be used in step 4. 110 Python has learned the Added model. So, if you only want to make a Diff model for the Python language, step 3 can be ignored. However, for other languages (JavaScript, GO, Ruby, PHP and JAVA), [Code2NL training](https://github.com/microsoft/CodeBERT#fine-tune-1) is required to use as the initial weight of the model to be used in step 4.
109 111
......
...@@ -24,6 +24,15 @@ from multiprocessing.pool import Pool ...@@ -24,6 +24,15 @@ from multiprocessing.pool import Pool
24 from transformers import RobertaTokenizer 24 from transformers import RobertaTokenizer
25 from pydriller import RepositoryMining 25 from pydriller import RepositoryMining
26 26
27 +language = {
28 + 'py' : ['.py'],
29 + 'js' : ['.js', '.ts'],
30 + 'go' : ['.go'],
31 + 'java' : ['.java'],
32 + 'ruby' : ['.rb'],
33 + 'php' : ['.php']
34 +}
35 +
27 def message_cleaner(message): 36 def message_cleaner(message):
28 msg = message.split("\n")[0] 37 msg = message.split("\n")[0]
29 msg = re.sub(r"(\(|)#([0-9])+(\)|)", "", msg) 38 msg = re.sub(r"(\(|)#([0-9])+(\)|)", "", msg)
...@@ -34,7 +43,7 @@ def jobs(repo, args): ...@@ -34,7 +43,7 @@ def jobs(repo, args):
34 repo_path = os.path.join(args.repos_dir, repo) 43 repo_path = os.path.join(args.repos_dir, repo)
35 if os.path.exists(repo_path): 44 if os.path.exists(repo_path):
36 for commit in RepositoryMining( 45 for commit in RepositoryMining(
37 - repo_path, only_modifications_with_file_types=['.py'] 46 + repo_path, only_modifications_with_file_types=language[args.lang]
38 ).traverse_commits(): 47 ).traverse_commits():
39 cleaned_message = message_cleaner(commit.msg) 48 cleaned_message = message_cleaner(commit.msg)
40 tokenized_message = args.tokenizer.tokenize(cleaned_message) 49 tokenized_message = args.tokenizer.tokenize(cleaned_message)
...@@ -44,7 +53,7 @@ def jobs(repo, args): ...@@ -44,7 +53,7 @@ def jobs(repo, args):
44 for mod in commit.modifications: 53 for mod in commit.modifications:
45 if not (mod.old_path and mod.new_path): 54 if not (mod.old_path and mod.new_path):
46 continue 55 continue
47 - if os.path.splitext(mod.new_path)[1] != '.py': 56 + if os.path.splitext(mod.new_path)[1] not in language[args.lang]:
48 continue 57 continue
49 if not mod.diff_parsed["added"]: 58 if not mod.diff_parsed["added"]:
50 continue 59 continue
...@@ -121,6 +130,9 @@ if __name__ == "__main__": ...@@ -121,6 +130,9 @@ if __name__ == "__main__":
121 help="directory that all repositories had been downloaded.",) 130 help="directory that all repositories had been downloaded.",)
122 parser.add_argument("--output_dir", type=str, required=True, 131 parser.add_argument("--output_dir", type=str, required=True,
123 help="The output directory where the preprocessed data will be written.") 132 help="The output directory where the preprocessed data will be written.")
133 + parser.add_argument("--lang", type=str, required=True,
134 + choices=['py', 'js', 'go', 'java', 'ruby', 'php'],
135 + help="The output directory where the preprocessed data will be written.")
124 parser.add_argument("--tokenizer_name", type=str, 136 parser.add_argument("--tokenizer_name", type=str,
125 default="microsoft/codebert-base", help="The name of tokenizer",) 137 default="microsoft/codebert-base", help="The name of tokenizer",)
126 parser.add_argument("--num_workers", default=4, type=int, help="number of process") 138 parser.add_argument("--num_workers", default=4, type=int, help="number of process")
......
1 +https://github.com/freeCodeCamp/freeCodeCamp
2 +https://github.com/vuejs/vue
3 +https://github.com/facebook/react
4 +https://github.com/twbs/bootstrap
5 +https://github.com/airbnb/javascript
6 +https://github.com/d3/d3
7 +https://github.com/facebook/react-native
8 +https://github.com/trekhleb/javascript-algorithms
9 +https://github.com/facebook/create-react-app
10 +https://github.com/axios/axios
11 +https://github.com/nodejs/node
12 +https://github.com/mrdoob/three.js
13 +https://github.com/mui-org/material-ui
14 +https://github.com/angular/angular.js
15 +https://github.com/vercel/next.js
16 +https://github.com/webpack/webpack
17 +https://github.com/jquery/jquery
18 +https://github.com/hakimel/reveal.js
19 +https://github.com/atom/atom
20 +https://github.com/socketio/socket.io
21 +https://github.com/chartjs/Chart.js
22 +https://github.com/expressjs/express
23 +https://github.com/typicode/json-server
24 +https://github.com/adam-p/markdown-here
25 +https://github.com/Semantic-Org/Semantic-UI
26 +https://github.com/h5bp/html5-boilerplate
27 +https://github.com/gatsbyjs/gatsby
28 +https://github.com/lodash/lodash
29 +https://github.com/yangshun/tech-interview-handbook
30 +https://github.com/moment/moment
31 +https://github.com/apache/incubator-echarts
32 +https://github.com/meteor/meteor
33 +https://github.com/ReactTraining/react-router
34 +https://github.com/yarnpkg/yarn
35 +https://github.com/sveltejs/svelte
36 +https://github.com/Dogfalo/materialize
37 +https://github.com/prettier/prettier
38 +https://github.com/serverless/serverless
39 +https://github.com/babel/babel
40 +https://github.com/nwjs/nw.js
41 +https://github.com/juliangarnier/anime
42 +https://github.com/parcel-bundler/parcel
43 +https://github.com/ColorlibHQ/AdminLTE
44 +https://github.com/impress/impress.js
45 +https://github.com/TryGhost/Ghost
46 +https://github.com/Unitech/pm2
47 +https://github.com/mozilla/pdf.js
48 +https://github.com/mermaid-js/mermaid
49 +https://github.com/algorithm-visualizer/algorithm-visualizer
50 +https://github.com/adobe/brackets
51 +https://github.com/gulpjs/gulp
52 +https://github.com/hexojs/hexo
53 +https://github.com/styled-components/styled-components
54 +https://github.com/nuxt/nuxt.js
55 +https://github.com/sahat/hackathon-starter
56 +https://github.com/alvarotrigo/fullPage.js
57 +https://github.com/strapi/strapi
58 +https://github.com/immutable-js/immutable-js
59 +https://github.com/koajs/koa
60 +https://github.com/videojs/video.js
61 +https://github.com/zenorocha/clipboard.js
62 +https://github.com/Leaflet/Leaflet
63 +https://github.com/RocketChat/Rocket.Chat
64 +https://github.com/photonstorm/phaser
65 +https://github.com/quilljs/quill
66 +https://github.com/jashkenas/backbone
67 +https://github.com/preactjs/preact
68 +https://github.com/tastejs/todomvc
69 +https://github.com/caolan/async
70 +https://github.com/vuejs/vue-cli
71 +https://github.com/react-boilerplate/react-boilerplate
72 +https://github.com/aosabook/500lines
73 +https://github.com/carbon-app/carbon
74 +https://github.com/Marak/faker.js
75 +https://github.com/jashkenas/underscore
76 +https://github.com/lerna/lerna
77 +https://github.com/nolimits4web/swiper
78 +https://github.com/vuejs/vuex
79 +https://github.com/request/request
80 +https://github.com/select2/select2
81 +https://github.com/Modernizr/Modernizr
82 +https://github.com/facebook/draft-js
83 +https://github.com/rollup/rollup
84 +https://github.com/jlmakes/scrollreveal
85 +https://github.com/tj/commander.js
86 +https://github.com/chenglou/react-motion
87 +https://github.com/swagger-api/swagger-ui
88 +https://github.com/bilibili/flv.js
89 +https://github.com/segmentio/nightmare
90 +https://github.com/laurent22/joplin
91 +https://github.com/react-bootstrap/react-bootstrap
92 +https://github.com/sampotts/plyr
93 +https://github.com/avajs/ava
94 +https://github.com/immerjs/immer
95 +https://github.com/jorgebucaran/hyperapp
96 +https://github.com/jaredhanson/passport
97 +https://github.com/lovell/sharp
98 +https://github.com/localForage/localForage
99 +https://github.com/Popmotion/popmotion
100 +https://github.com/vuejs/vuepress
...\ No newline at end of file ...\ No newline at end of file
1 +diff --git a/function.js b/function.js
2 +new file mode 100644
3 +index 0000000..ba89d9a
4 +--- /dev/null
5 ++++ b/function.js
6 +@@ -0,0 +1,6 @@
7 ++function getIntoAnArgument() {
8 ++ var args = arguments.slice();
9 ++ args.forEach(function(arg) {
10 ++ console.log(arg);
11 ++ });
12 ++}
13 +\ No newline at end of file
1 +diff --git a/function.js b/function.js
2 +index ba89d9a..d440734 100644
3 +--- a/function.js
4 ++++ b/function.js
5 +@@ -1,6 +1,3 @@
6 +-function getIntoAnArgument() {
7 +- var args = arguments.slice();
8 +- args.forEach(function(arg) {
9 +- console.log(arg);
10 +- });
11 ++function getIntoAnArgument(...args) {
12 ++ args.forEach(arg => console.log(arg));
13 + }
14 +\ No newline at end of file
...@@ -65,10 +65,6 @@ class CitiesTestCase(unittest.TestCase): ...@@ -65,10 +65,6 @@ class CitiesTestCase(unittest.TestCase):
65 ) 65 )
66 ) 66 )
67 self.assertEqual(response.status_code, 200) 67 self.assertEqual(response.status_code, 200)
68 - self.assertEqual(
69 - json.loads(response.text),
70 - {'idx': 0, 'message': ['Test method .']}
71 - )
72 68
73 def test_added(self): 69 def test_added(self):
74 response = requests.post( 70 response = requests.post(
...@@ -83,10 +79,6 @@ class CitiesTestCase(unittest.TestCase): ...@@ -83,10 +79,6 @@ class CitiesTestCase(unittest.TestCase):
83 ) 79 )
84 ) 80 )
85 self.assertEqual(response.status_code, 200) 81 self.assertEqual(response.status_code, 200)
86 - self.assertEqual(
87 - json.loads(response.text),
88 - {'idx': 0, 'message': ['Fix typo']}
89 - )
90 82
91 83
92 def suite(): 84 def suite():
......