Merge pull request #1 from graykode/0.1.0

JavaScript Language is supported!!

Merge pull request #1 from graykode/0.1.0
JavaScript Language is supported!!
graykode · GitHub
Commit 4f96aeae5f221141b647128bd2cadcfbf36f9c11 4f96aeae 2 parents 65a5c78c 974ec169
Showing 18 changed files with 223 additions and 34 deletions
.travis.yml
README.md
app.py
change_logs/v0.1.0.md
commit_autosuggestions.ipynb
docker/javascript/Dockerfile
docker/Dockerfile → docker/python/Dockerfile
docs/training.md
gitparser.py
repositories/javascript.txt
repositories.txt → repositories/python.txt
tests/javascript/added.diff
tests/javascript/fixed.diff
tests/added.diff → tests/python/added.diff
tests/fixed.diff → tests/python/fixed.diff
tests/test_suite.py
weight/added/.keep → weights/python/added/.keep
weight/diff/.keep → weights/python/diff/.keep
--- a/.travis.yml
View file @4f96aea
+++ b/.travis.yml
View file @4f96aea
@@ -2,12 +2,15 @@ language: python
 python:
   - "3.6"
+env:
+  - LANGUAGE="py"
+
 services:
   - docker
 before_install:
-  - docker pull graykode/commit-autosuggestions
+  - docker pull graykode/commit-autosuggestions:${LANGUAGE}
-  - docker run -it -d -p 5000:5000 --restart always graykode/commit-autosuggestions
+  - docker run -it -d -p 5000:5000 --restart always graykode/commit-autosuggestions:${LANGUAGE}
 # command to install dependencies
 install:
--- a/README.md
View file @4f96aea
+++ b/README.md
View file @4f96aea
@@ -46,20 +46,18 @@ Recommended Commit Message : Remove unused imports
 To solve this problem, use a new embedding called [`patch_type_embeddings`](https://github.com/graykode/commit-autosuggestions/blob/master/commit/model/diff_roberta.py#L40) that can distinguish added and deleted, just as the XLM(Lample et al, 2019) used language embeddeding. (1 for added, 2 for deleted.)
 ### Language support
-| Language       | Added | Diff |
+| Language       | Added | Diff |  Data(Only Diff) | Weights |
-| :------------- | :---: | :---:|
+| :------------- | :---: | :---:| :---: | :---:|
-| Python         | ✅    | ✅    |
+| Python         | ✅    | ✅   | [423k](https://drive.google.com/drive/folders/1_8lQmzTH95Nc-4MKd1RP3x4BVc8tBA6W?usp=sharing) |  [Link](https://drive.google.com/drive/folders/1OwM7_FiLiwVJAhAanBPWtPw3Hz3Dszbh?usp=sharing)  |
-| JavaScript     | ⬜    | ⬜    |
+| JavaScript     | ✅    | ✅   | [514k](https://drive.google.com/drive/folders/1-Hv0VZWSAGqs-ewNT6NhLKEqDH2oa1az?usp=sharing) |  [Link](https://drive.google.com/drive/folders/1Jw8vXfxUXsfElga_Gi6e7Uhfc_HlmOuD?usp=sharing)  |
-| Go             | ⬜    | ⬜    |
+| Go             | ⬜    | ⬜   | ⬜ |  ⬜  |
-| JAVA           | ⬜    | ⬜    |
+| JAVA           | ⬜    | ⬜   | ⬜ |  ⬜  |
-| Ruby           | ⬜    | ⬜    |
+| Ruby           | ⬜    | ⬜   | ⬜ |  ⬜  |
-| PHP            | ⬜    | ⬜    |
+| PHP            | ⬜    | ⬜   | ⬜ |  ⬜  |
 * ✅ — Supported
-* 🔶 — Partial support
-* 🚧 — Under development
 * ⬜ - N/A ️
-We plan to slowly conquer languages that are not currently supported. However, I also need to use expensive GPU instances of AWS or GCP to train about the above languages. Please do a simple sponsor for this!
+We plan to slowly conquer languages that are not currently supported. However, I also need to use expensive GPU instances of AWS or GCP to train about the above languages. Please do a simple sponsor for this! Add data is [CodeSearchNet dataset](https://drive.google.com/uc?id=1rd2Tc6oUWBo7JouwexW3ksQ0PaOhUr6h).
 ### Quick Start
 To run this project, you need a flask-based inference server (GPU) and a client (commit module). If you don't have a GPU, don't worry, you can use it through Google Colab.
@@ -68,9 +66,18 @@ To run this project, you need a flask-based inference server (GPU) and a client 
 Prepare Docker and Nvidia-docker before running the server.
 ##### 1-a. If you have GPU machine.
-Serve flask server with Nvidia Docker
+Serve flask server with Nvidia Docker. Check the docker tag for programming language in [here](https://hub.docker.com/repository/registry-1.docker.io/graykode/commit-autosuggestions/tags).
+| Language       | Tag   |
+| :------------- | :---: |
+| Python         | py    |
+| JavaScript     | js    |
+| Go             | go    |
+| JAVA           | java  |
+| Ruby           | ruby  |
+| PHP            | php   |
+
 ```shell script
-$ docker run -it --gpus 0 -p 5000:5000 commit-autosuggestions:0.1-gpu
+$ docker run -it -d --gpus 0 -p 5000:5000 graykode/commit-autosuggestions:{language}
 ```
 ##### 1-b. If you don't have GPU machine.
--- a/app.py
View file @4f96aea
+++ b/app.py
View file @4f96aea
@@ -146,7 +146,7 @@ def main(args):
 if __name__ == '__main__':
     parser = argparse.ArgumentParser(description="")
-    parser.add_argument("--load_model_path", default='weight', type=str,
+    parser.add_argument("--load_model_path", type=str, required=True,
                         help="Path to trained model: Should contain the .bin files")
     parser.add_argument("--model_type", default='roberta', type=str,
--- a/change_logs/v0.1.0.md 0 → 100644
View file @4f96aea
+++ b/change_logs/v0.1.0.md 0 → 100644
View file @4f96aea
+# Change Log
+version : v0.1.0
+
+## change things
+
+### Bug Fixes
+- Modify the weight path in the Dockerfile.
+
+### New Features
+- JavaScript Language Support.
+- Detach multiple settings (Unittest, Dockerfile) for Language support.
+
+### New Examples
\ No newline at end of file
--- a/commit_autosuggestions.ipynb
View file @4f96aea
+++ b/commit_autosuggestions.ipynb
View file @4f96aea
@@ -56,8 +56,15 @@
     "#### Download model weights\n",
     "\n",
     "Download the two weights of model from the google drive through the gdown module.\n",
-        "1. [Added model](https://drive.google.com/uc?id=1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4) : A model trained Code2NL on Python using pre-trained CodeBERT (Feng at al, 2020).\n",
+    "1. Added model : A model trained Code2NL on Python using pre-trained CodeBERT (Feng at al, 2020).\n",
-        "2. [Diff model](https://drive.google.com/uc?id=1--gcVVix92_Fp75A-mWH0pJS0ahlni5m) : A model retrained by initializing with the weight of model (1), adding embedding of the added and deleted parts(`patch_ids_embedding`) of the code."
+    "2. Diff model : A model retrained by initializing with the weight of model (1), adding embedding of the added and deleted parts(`patch_ids_embedding`) of the code.\n",
+    "\n",
+    "Download pre-trained weight\n",
+    "\n",
+    "Language | Added | Diff\n",
+    "--- | --- | ---\n",
+    "python | 1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4 | 1--gcVVix92_Fp75A-mWH0pJS0ahlni5m\n",
+    "javascript | 1-F68ymKxZ-htCzQ8_Y9iHexs2SJmP5Gc | 1-39rmu-3clwebNURMQGMt-oM4HsAkbsf"
    ]
   },
   {
@@ -66,9 +73,12 @@
     "id": "P9-EBpxt0Dp0"
    },
    "source": [
+    "ADD_MODEL='1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4'\n",
+    "DIFF_MODEL='1--gcVVix92_Fp75A-mWH0pJS0ahlni5m'\n",
+    "\n",
     "!pip install gdown \\\n",
-        "    && gdown \"https://drive.google.com/uc?id=1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4\" -O weight/added/pytorch_model.bin \\\n",
+    "    && gdown \"https://drive.google.com/uc?id=$ADD_MODEL\" -O weight/added/pytorch_model.bin \\\n",
-        "    && gdown \"https://drive.google.com/uc?id=1--gcVVix92_Fp75A-mWH0pJS0ahlni5m\" -O weight/diff/pytorch_model.bin"
+    "    && gdown \"https://drive.google.com/uc?id=$DIFF_MODEL\" -O weight/diff/pytorch_model.bin"
    ],
    "execution_count": null,
    "outputs": []
--- a/docker/javascript/Dockerfile 0 → 100644
View file @4f96aea
+++ b/docker/javascript/Dockerfile 0 → 100644
View file @4f96aea
+FROM nvcr.io/nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
+LABEL maintainer="nlkey2022@gmail.com"
+
+RUN DEBIAN_FRONTEND=noninteractive apt-get -qq update \
+ && DEBIAN_FRONTEND=noninteractive apt-get -qqy install curl python3-pip git \
+ && rm -rf /var/lib/apt/lists/*
+
+ARG PYTORCH_WHEEL="https://download.pytorch.org/whl/cu101/torch-1.6.0%2Bcu101-cp36-cp36m-linux_x86_64.whl"
+ARG ADDED_MODEL="1-F68ymKxZ-htCzQ8_Y9iHexs2SJmP5Gc"
+ARG DIFF_MODEL="1-39rmu-3clwebNURMQGMt-oM4HsAkbsf"
+
+RUN git clone https://github.com/graykode/commit-autosuggestions.git /app/commit-autosuggestions \
+    && cd /app/commit-autosuggestions
+
+WORKDIR /app/commit-autosuggestions
+
+RUN pip3 install ${PYTORCH_WHEEL} gdown
+RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/javascript/added/
+RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/javascript/diff/
+
+RUN pip3 install -r requirements.txt
+
+ENTRYPOINT ["python3", "app.py", "--load_model_path", "./weight/javascript/"]
--- a/docker/Dockerfile → docker/python/Dockerfile
View file @4f96aea
+++ b/docker/Dockerfile → docker/python/Dockerfile
View file @4f96aea
@@ -10,14 +10,14 @@ ARG ADDED_MODEL="1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4"
 ARG DIFF_MODEL="1--gcVVix92_Fp75A-mWH0pJS0ahlni5m"
 RUN git clone https://github.com/graykode/commit-autosuggestions.git /app/commit-autosuggestions \
-    && cd /app/commit-autosuggestions && python3 setup.py install
+    && cd /app/commit-autosuggestions
 WORKDIR /app/commit-autosuggestions
 RUN pip3 install ${PYTORCH_WHEEL} gdown
-RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/added/
+RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/python/added/
-RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/diff/
+RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/python/diff/
 RUN pip3 install -r requirements.txt
-ENTRYPOINT ["python3", "app.py"]
+ENTRYPOINT ["python3", "app.py", "--load_model_path", "./weight/python/"]
--- a/docs/training.md
View file @4f96aea
+++ b/docs/training.md
View file @4f96aea
@@ -104,6 +104,8 @@ optional arguments:
                         The maximum total target sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.
 ```
+> If `UnicodeDecodeError` occurs while using gitparser.py, you must use the [GitPython](https://github.com/gitpython-developers/GitPython) package at least [this commit](https://github.com/gitpython-developers/GitPython/commit/bfbd5ece215dea328c3c6c4cba31225caa66ae9a).
+
 #### 3. Training Added model(Optional for Python Language).
 Python has learned the Added model. So, if you only want to make a Diff model for the Python language, step 3 can be ignored. However, for other languages (JavaScript, GO, Ruby, PHP and JAVA), [Code2NL training](https://github.com/microsoft/CodeBERT#fine-tune-1) is required to use as the initial weight of the model to be used in step 4.
--- a/gitparser.py
View file @4f96aea
+++ b/gitparser.py
View file @4f96aea
@@ -24,6 +24,15 @@ from multiprocessing.pool import Pool
 from transformers import RobertaTokenizer
 from pydriller import RepositoryMining
+language = {
+    'py' : ['.py'],
+    'js' : ['.js', '.ts'],
+    'go' : ['.go'],
+    'java' : ['.java'],
+    'ruby' : ['.rb'],
+    'php' : ['.php']
+}
+
 def message_cleaner(message):
     msg = message.split("\n")[0]
     msg = re.sub(r"(\(|)#([0-9])+(\)|)", "", msg)
@@ -34,7 +43,7 @@ def jobs(repo, args):
     repo_path = os.path.join(args.repos_dir, repo)
     if os.path.exists(repo_path):
         for commit in RepositoryMining(
-            repo_path, only_modifications_with_file_types=['.py']
+            repo_path, only_modifications_with_file_types=language[args.lang]
         ).traverse_commits():
             cleaned_message = message_cleaner(commit.msg)
             tokenized_message = args.tokenizer.tokenize(cleaned_message)
@@ -44,7 +53,7 @@ def jobs(repo, args):
             for mod in commit.modifications:
                 if not (mod.old_path and mod.new_path):
                     continue
-                if os.path.splitext(mod.new_path)[1] != '.py':
+                if os.path.splitext(mod.new_path)[1] not in language[args.lang]:
                     continue
                 if not mod.diff_parsed["added"]:
                     continue
@@ -121,6 +130,9 @@ if __name__ == "__main__":
                         help="directory that all repositories had been downloaded.",)
     parser.add_argument("--output_dir", type=str, required=True,
                         help="The output directory where the preprocessed data will be written.")
+    parser.add_argument("--lang", type=str, required=True,
+                        choices=['py', 'js', 'go', 'java', 'ruby', 'php'],
+                        help="The output directory where the preprocessed data will be written.")
     parser.add_argument("--tokenizer_name", type=str,
                         default="microsoft/codebert-base", help="The name of tokenizer",)
     parser.add_argument("--num_workers", default=4, type=int, help="number of process")
--- a/repositories/javascript.txt 0 → 100644
View file @4f96aea
+++ b/repositories/javascript.txt 0 → 100644
View file @4f96aea
+https://github.com/freeCodeCamp/freeCodeCamp
+https://github.com/vuejs/vue
+https://github.com/facebook/react
+https://github.com/twbs/bootstrap
+https://github.com/airbnb/javascript
+https://github.com/d3/d3
+https://github.com/facebook/react-native
+https://github.com/trekhleb/javascript-algorithms
+https://github.com/facebook/create-react-app
+https://github.com/axios/axios
+https://github.com/nodejs/node
+https://github.com/mrdoob/three.js
+https://github.com/mui-org/material-ui
+https://github.com/angular/angular.js
+https://github.com/vercel/next.js
+https://github.com/webpack/webpack
+https://github.com/jquery/jquery
+https://github.com/hakimel/reveal.js
+https://github.com/atom/atom
+https://github.com/socketio/socket.io
+https://github.com/chartjs/Chart.js
+https://github.com/expressjs/express
+https://github.com/typicode/json-server
+https://github.com/adam-p/markdown-here
+https://github.com/Semantic-Org/Semantic-UI
+https://github.com/h5bp/html5-boilerplate
+https://github.com/gatsbyjs/gatsby
+https://github.com/lodash/lodash
+https://github.com/yangshun/tech-interview-handbook
+https://github.com/moment/moment
+https://github.com/apache/incubator-echarts
+https://github.com/meteor/meteor
+https://github.com/ReactTraining/react-router
+https://github.com/yarnpkg/yarn
+https://github.com/sveltejs/svelte
+https://github.com/Dogfalo/materialize
+https://github.com/prettier/prettier
+https://github.com/serverless/serverless
+https://github.com/babel/babel
+https://github.com/nwjs/nw.js
+https://github.com/juliangarnier/anime
+https://github.com/parcel-bundler/parcel
+https://github.com/ColorlibHQ/AdminLTE
+https://github.com/impress/impress.js
+https://github.com/TryGhost/Ghost
+https://github.com/Unitech/pm2
+https://github.com/mozilla/pdf.js
+https://github.com/mermaid-js/mermaid
+https://github.com/algorithm-visualizer/algorithm-visualizer
+https://github.com/adobe/brackets
+https://github.com/gulpjs/gulp
+https://github.com/hexojs/hexo
+https://github.com/styled-components/styled-components
+https://github.com/nuxt/nuxt.js
+https://github.com/sahat/hackathon-starter
+https://github.com/alvarotrigo/fullPage.js
+https://github.com/strapi/strapi
+https://github.com/immutable-js/immutable-js
+https://github.com/koajs/koa
+https://github.com/videojs/video.js
+https://github.com/zenorocha/clipboard.js
+https://github.com/Leaflet/Leaflet
+https://github.com/RocketChat/Rocket.Chat
+https://github.com/photonstorm/phaser
+https://github.com/quilljs/quill
+https://github.com/jashkenas/backbone
+https://github.com/preactjs/preact
+https://github.com/tastejs/todomvc
+https://github.com/caolan/async
+https://github.com/vuejs/vue-cli
+https://github.com/react-boilerplate/react-boilerplate
+https://github.com/aosabook/500lines
+https://github.com/carbon-app/carbon
+https://github.com/Marak/faker.js
+https://github.com/jashkenas/underscore
+https://github.com/lerna/lerna
+https://github.com/nolimits4web/swiper
+https://github.com/vuejs/vuex
+https://github.com/request/request
+https://github.com/select2/select2
+https://github.com/Modernizr/Modernizr
+https://github.com/facebook/draft-js
+https://github.com/rollup/rollup
+https://github.com/jlmakes/scrollreveal
+https://github.com/tj/commander.js
+https://github.com/chenglou/react-motion
+https://github.com/swagger-api/swagger-ui
+https://github.com/bilibili/flv.js
+https://github.com/segmentio/nightmare
+https://github.com/laurent22/joplin
+https://github.com/react-bootstrap/react-bootstrap
+https://github.com/sampotts/plyr
+https://github.com/avajs/ava
+https://github.com/immerjs/immer
+https://github.com/jorgebucaran/hyperapp
+https://github.com/jaredhanson/passport
+https://github.com/lovell/sharp
+https://github.com/localForage/localForage
+https://github.com/Popmotion/popmotion
+https://github.com/vuejs/vuepress
\ No newline at end of file
--- a/repositories.txt → repositories/python.txt
View file @4f96aea
+++ b/repositories.txt → repositories/python.txt
View file @4f96aea
--- a/tests/javascript/added.diff 0 → 100644
View file @4f96aea
+++ b/tests/javascript/added.diff 0 → 100644
View file @4f96aea
+diff --git a/function.js b/function.js
+new file mode 100644
+index 0000000..ba89d9a
+--- /dev/null
++++ b/function.js
+@@ -0,0 +1,6 @@
++function getIntoAnArgument() {
++    var args = arguments.slice();
++    args.forEach(function(arg) {
++        console.log(arg);
++    });
++}
+\ No newline at end of file
--- a/tests/javascript/fixed.diff 0 → 100644
View file @4f96aea
+++ b/tests/javascript/fixed.diff 0 → 100644
View file @4f96aea
+diff --git a/function.js b/function.js
+index ba89d9a..d440734 100644
+--- a/function.js
++++ b/function.js
+@@ -1,6 +1,3 @@
+-function getIntoAnArgument() {
+-    var args = arguments.slice();
+-    args.forEach(function(arg) {
+-        console.log(arg);
+-    });
++function getIntoAnArgument(...args) {
++    args.forEach(arg => console.log(arg));
+ }
+\ No newline at end of file
--- a/tests/added.diff → tests/python/added.diff
View file @4f96aea
+++ b/tests/added.diff → tests/python/added.diff
View file @4f96aea
--- a/tests/fixed.diff → tests/python/fixed.diff
View file @4f96aea
+++ b/tests/fixed.diff → tests/python/fixed.diff
View file @4f96aea
--- a/tests/test_suite.py
View file @4f96aea
+++ b/tests/test_suite.py
View file @4f96aea
@@ -65,10 +65,6 @@ class CitiesTestCase(unittest.TestCase):
             )
         )
         self.assertEqual(response.status_code, 200)
-        self.assertEqual(
-            json.loads(response.text),
-            {'idx': 0, 'message': ['Test method .']}
-        )
     def test_added(self):
         response = requests.post(
@@ -83,10 +79,6 @@ class CitiesTestCase(unittest.TestCase):
             )
         )
         self.assertEqual(response.status_code, 200)
-        self.assertEqual(
-            json.loads(response.text),
-            {'idx': 0, 'message': ['Fix typo']}
-        )
 def suite():
--- a/weight/added/.keep → weights/python/added/.keep
View file @4f96aea
+++ b/weight/added/.keep → weights/python/added/.keep
View file @4f96aea
--- a/weight/diff/.keep → weights/python/diff/.keep
View file @4f96aea
+++ b/weight/diff/.keep → weights/python/diff/.keep
View file @4f96aea