Merge pull request #1 from graykode/0.1.0

JavaScript Language is supported!!

Merge pull request #1 from graykode/0.1.0
JavaScript Language is supported!!
graykode · GitHub
Commit 4f96aeae5f221141b647128bd2cadcfbf36f9c11 4f96aeae 2 parents 65a5c78c 974ec169
Showing 18 changed files with 209 additions and 30 deletions
.travis.yml
README.md
app.py
change_logs/v0.1.0.md
commit_autosuggestions.ipynb
docker/javascript/Dockerfile
docker/Dockerfile → docker/python/Dockerfile
docs/training.md
gitparser.py
repositories/javascript.txt
repositories.txt → repositories/python.txt
tests/javascript/added.diff
tests/javascript/fixed.diff
tests/added.diff → tests/python/added.diff
tests/fixed.diff → tests/python/fixed.diff
tests/test_suite.py
weight/added/.keep → weights/python/added/.keep
weight/diff/.keep → weights/python/diff/.keep
--- a/.travis.yml
View file @4f96aea
+++ b/.travis.yml
View file @4f96aea
@@ -2,12 +2,15 @@ language: python
 python:
   - "3.6"
 
+ env:
+   - LANGUAGE="py"
+ 
 services:
   - docker
 
 before_install:
-   - docker pull graykode/commit-autosuggestions
-   - docker run -it -d -p 5000:5000 --restart always graykode/commit-autosuggestions
+   - docker pull graykode/commit-autosuggestions:${LANGUAGE}
+   - docker run -it -d -p 5000:5000 --restart always graykode/commit-autosuggestions:${LANGUAGE}
 
 # command to install dependencies
 install:
--- a/README.md
View file @4f96aea
+++ b/README.md
View file @4f96aea
@@ -46,20 +46,18 @@ Recommended Commit Message : Remove unused imports
 To solve this problem, use a new embedding called [`patch_type_embeddings`](https://github.com/graykode/commit-autosuggestions/blob/master/commit/model/diff_roberta.py#L40) that can distinguish added and deleted, just as the XLM(Lample et al, 2019) used language embeddeding. (1 for added, 2 for deleted.)
 
 ### Language support
- | Language       | Added | Diff |
- | :------------- | :---: | :---:|
- | Python         | ✅    | ✅    |
- | JavaScript     | ⬜    | ⬜    |
- | Go             | ⬜    | ⬜    |
- | JAVA           | ⬜    | ⬜    |
- | Ruby           | ⬜    | ⬜    |
- | PHP            | ⬜    | ⬜    |
+ | Language       | Added | Diff |  Data(Only Diff) | Weights |
+ | :------------- | :---: | :---:| :---: | :---:|
+ | Python         | ✅    | ✅   | [423k](https://drive.google.com/drive/folders/1_8lQmzTH95Nc-4MKd1RP3x4BVc8tBA6W?usp=sharing) |  [Link](https://drive.google.com/drive/folders/1OwM7_FiLiwVJAhAanBPWtPw3Hz3Dszbh?usp=sharing)  |
+ | JavaScript     | ✅    | ✅   | [514k](https://drive.google.com/drive/folders/1-Hv0VZWSAGqs-ewNT6NhLKEqDH2oa1az?usp=sharing) |  [Link](https://drive.google.com/drive/folders/1Jw8vXfxUXsfElga_Gi6e7Uhfc_HlmOuD?usp=sharing)  |
+ | Go             | ⬜    | ⬜   | ⬜ |  ⬜  |
+ | JAVA           | ⬜    | ⬜   | ⬜ |  ⬜  |
+ | Ruby           | ⬜    | ⬜   | ⬜ |  ⬜  |
+ | PHP            | ⬜    | ⬜   | ⬜ |  ⬜  |
 * ✅ — Supported
- * 🔶 — Partial support
- * 🚧 — Under development
 * ⬜ - N/A ️
 
- We plan to slowly conquer languages that are not currently supported. However, I also need to use expensive GPU instances of AWS or GCP to train about the above languages. Please do a simple sponsor for this!
+ We plan to slowly conquer languages that are not currently supported. However, I also need to use expensive GPU instances of AWS or GCP to train about the above languages. Please do a simple sponsor for this! Add data is [CodeSearchNet dataset](https://drive.google.com/uc?id=1rd2Tc6oUWBo7JouwexW3ksQ0PaOhUr6h).
 
 ### Quick Start
 To run this project, you need a flask-based inference server (GPU) and a client (commit module). If you don't have a GPU, don't worry, you can use it through Google Colab.
@@ -68,9 +66,18 @@ To run this project, you need a flask-based inference server (GPU) and a client 
 Prepare Docker and Nvidia-docker before running the server.
 
 ##### 1-a. If you have GPU machine.
- Serve flask server with Nvidia Docker
+ Serve flask server with Nvidia Docker. Check the docker tag for programming language in [here](https://hub.docker.com/repository/registry-1.docker.io/graykode/commit-autosuggestions/tags).
+ | Language       | Tag   |
+ | :------------- | :---: |
+ | Python         | py    |
+ | JavaScript     | js    |
+ | Go             | go    |
+ | JAVA           | java  |
+ | Ruby           | ruby  |
+ | PHP            | php   |
+ 
 ```shell script
- $ docker run -it --gpus 0 -p 5000:5000 commit-autosuggestions:0.1-gpu
+ $ docker run -it -d --gpus 0 -p 5000:5000 graykode/commit-autosuggestions:{language}
 ```
 
 ##### 1-b. If you don't have GPU machine.
--- a/app.py
View file @4f96aea
+++ b/app.py
View file @4f96aea
@@ -146,7 +146,7 @@ def main(args):
 
 if __name__ == '__main__':
     parser = argparse.ArgumentParser(description="")
-     parser.add_argument("--load_model_path", default='weight', type=str,
+     parser.add_argument("--load_model_path", type=str, required=True,
                         help="Path to trained model: Should contain the .bin files")
 
     parser.add_argument("--model_type", default='roberta', type=str,
--- a/change_logs/v0.1.0.md 0 → 100644
View file @4f96aea
+++ b/change_logs/v0.1.0.md 0 → 100644
View file @4f96aea
+ # Change Log
+ version : v0.1.0
+ 
+ ## change things
+ 
+ ### Bug Fixes
+ - Modify the weight path in the Dockerfile.
+ 
+ ### New Features
+ - JavaScript Language Support.
+ - Detach multiple settings (Unittest, Dockerfile) for Language support.
+ 
+ ### New Examples
\ No newline at end of file
--- a/commit_autosuggestions.ipynb
View file @4f96aea
+++ b/commit_autosuggestions.ipynb
View file @4f96aea
--- a/docker/javascript/Dockerfile 0 → 100644
View file @4f96aea
+++ b/docker/javascript/Dockerfile 0 → 100644
View file @4f96aea
+ FROM nvcr.io/nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
+ LABEL maintainer="nlkey2022@gmail.com"
+ 
+ RUN DEBIAN_FRONTEND=noninteractive apt-get -qq update \
+  && DEBIAN_FRONTEND=noninteractive apt-get -qqy install curl python3-pip git \
+  && rm -rf /var/lib/apt/lists/*
+ 
+ ARG PYTORCH_WHEEL="https://download.pytorch.org/whl/cu101/torch-1.6.0%2Bcu101-cp36-cp36m-linux_x86_64.whl"
+ ARG ADDED_MODEL="1-F68ymKxZ-htCzQ8_Y9iHexs2SJmP5Gc"
+ ARG DIFF_MODEL="1-39rmu-3clwebNURMQGMt-oM4HsAkbsf"
+ 
+ RUN git clone https://github.com/graykode/commit-autosuggestions.git /app/commit-autosuggestions \
+     && cd /app/commit-autosuggestions
+ 
+ WORKDIR /app/commit-autosuggestions
+ 
+ RUN pip3 install ${PYTORCH_WHEEL} gdown
+ RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/javascript/added/
+ RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/javascript/diff/
+ 
+ RUN pip3 install -r requirements.txt
+ 
+ ENTRYPOINT ["python3", "app.py", "--load_model_path", "./weight/javascript/"]
--- a/docker/Dockerfile → docker/python/Dockerfile
View file @4f96aea
+++ b/docker/Dockerfile → docker/python/Dockerfile
View file @4f96aea
@@ -10,14 +10,14 @@ ARG ADDED_MODEL="1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4"
 ARG DIFF_MODEL="1--gcVVix92_Fp75A-mWH0pJS0ahlni5m"
 
 RUN git clone https://github.com/graykode/commit-autosuggestions.git /app/commit-autosuggestions \
-     && cd /app/commit-autosuggestions && python3 setup.py install
+     && cd /app/commit-autosuggestions
 
 WORKDIR /app/commit-autosuggestions
 
 RUN pip3 install ${PYTORCH_WHEEL} gdown
- RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/added/
- RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/diff/
+ RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/python/added/
+ RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/python/diff/
 
 RUN pip3 install -r requirements.txt
 
- ENTRYPOINT ["python3", "app.py"]
+ ENTRYPOINT ["python3", "app.py", "--load_model_path", "./weight/python/"]
--- a/docs/training.md
View file @4f96aea
+++ b/docs/training.md
View file @4f96aea
@@ -104,6 +104,8 @@ optional arguments:
                         The maximum total target sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.
 ```
 
+ > If `UnicodeDecodeError` occurs while using gitparser.py, you must use the [GitPython](https://github.com/gitpython-developers/GitPython) package at least [this commit](https://github.com/gitpython-developers/GitPython/commit/bfbd5ece215dea328c3c6c4cba31225caa66ae9a).
+ 
 #### 3. Training Added model(Optional for Python Language).
 Python has learned the Added model. So, if you only want to make a Diff model for the Python language, step 3 can be ignored. However, for other languages (JavaScript, GO, Ruby, PHP and JAVA), [Code2NL training](https://github.com/microsoft/CodeBERT#fine-tune-1) is required to use as the initial weight of the model to be used in step 4.
 
--- a/gitparser.py
View file @4f96aea
+++ b/gitparser.py
View file @4f96aea
@@ -24,6 +24,15 @@ from multiprocessing.pool import Pool
 from transformers import RobertaTokenizer
 from pydriller import RepositoryMining
 
+ language = {
+     'py' : ['.py'],
+     'js' : ['.js', '.ts'],
+     'go' : ['.go'],
+     'java' : ['.java'],
+     'ruby' : ['.rb'],
+     'php' : ['.php']
+ }
+ 
 def message_cleaner(message):
     msg = message.split("\n")[0]
     msg = re.sub(r"(\(|)#([0-9])+(\)|)", "", msg)
@@ -34,7 +43,7 @@ def jobs(repo, args):
     repo_path = os.path.join(args.repos_dir, repo)
     if os.path.exists(repo_path):
         for commit in RepositoryMining(
-             repo_path, only_modifications_with_file_types=['.py']
+             repo_path, only_modifications_with_file_types=language[args.lang]
         ).traverse_commits():
             cleaned_message = message_cleaner(commit.msg)
             tokenized_message = args.tokenizer.tokenize(cleaned_message)
@@ -44,7 +53,7 @@ def jobs(repo, args):
             for mod in commit.modifications:
                 if not (mod.old_path and mod.new_path):
                     continue
-                 if os.path.splitext(mod.new_path)[1] != '.py':
+                 if os.path.splitext(mod.new_path)[1] not in language[args.lang]:
                     continue
                 if not mod.diff_parsed["added"]:
                     continue
@@ -121,6 +130,9 @@ if __name__ == "__main__":
                         help="directory that all repositories had been downloaded.",)
     parser.add_argument("--output_dir", type=str, required=True,
                         help="The output directory where the preprocessed data will be written.")
+     parser.add_argument("--lang", type=str, required=True,
+                         choices=['py', 'js', 'go', 'java', 'ruby', 'php'],
+                         help="The output directory where the preprocessed data will be written.")
     parser.add_argument("--tokenizer_name", type=str,
                         default="microsoft/codebert-base", help="The name of tokenizer",)
     parser.add_argument("--num_workers", default=4, type=int, help="number of process")
--- a/repositories/javascript.txt 0 → 100644
View file @4f96aea
+++ b/repositories/javascript.txt 0 → 100644
View file @4f96aea
+ https://github.com/freeCodeCamp/freeCodeCamp
+ https://github.com/vuejs/vue
+ https://github.com/facebook/react
+ https://github.com/twbs/bootstrap
+ https://github.com/airbnb/javascript
+ https://github.com/d3/d3
+ https://github.com/facebook/react-native
+ https://github.com/trekhleb/javascript-algorithms
+ https://github.com/facebook/create-react-app
+ https://github.com/axios/axios
+ https://github.com/nodejs/node
+ https://github.com/mrdoob/three.js
+ https://github.com/mui-org/material-ui
+ https://github.com/angular/angular.js
+ https://github.com/vercel/next.js
+ https://github.com/webpack/webpack
+ https://github.com/jquery/jquery
+ https://github.com/hakimel/reveal.js
+ https://github.com/atom/atom
+ https://github.com/socketio/socket.io
+ https://github.com/chartjs/Chart.js
+ https://github.com/expressjs/express
+ https://github.com/typicode/json-server
+ https://github.com/adam-p/markdown-here
+ https://github.com/Semantic-Org/Semantic-UI
+ https://github.com/h5bp/html5-boilerplate
+ https://github.com/gatsbyjs/gatsby
+ https://github.com/lodash/lodash
+ https://github.com/yangshun/tech-interview-handbook
+ https://github.com/moment/moment
+ https://github.com/apache/incubator-echarts
+ https://github.com/meteor/meteor
+ https://github.com/ReactTraining/react-router
+ https://github.com/yarnpkg/yarn
+ https://github.com/sveltejs/svelte
+ https://github.com/Dogfalo/materialize
+ https://github.com/prettier/prettier
+ https://github.com/serverless/serverless
+ https://github.com/babel/babel
+ https://github.com/nwjs/nw.js
+ https://github.com/juliangarnier/anime
+ https://github.com/parcel-bundler/parcel
+ https://github.com/ColorlibHQ/AdminLTE
+ https://github.com/impress/impress.js
+ https://github.com/TryGhost/Ghost
+ https://github.com/Unitech/pm2
+ https://github.com/mozilla/pdf.js
+ https://github.com/mermaid-js/mermaid
+ https://github.com/algorithm-visualizer/algorithm-visualizer
+ https://github.com/adobe/brackets
+ https://github.com/gulpjs/gulp
+ https://github.com/hexojs/hexo
+ https://github.com/styled-components/styled-components
+ https://github.com/nuxt/nuxt.js
+ https://github.com/sahat/hackathon-starter
+ https://github.com/alvarotrigo/fullPage.js
+ https://github.com/strapi/strapi
+ https://github.com/immutable-js/immutable-js
+ https://github.com/koajs/koa
+ https://github.com/videojs/video.js
+ https://github.com/zenorocha/clipboard.js
+ https://github.com/Leaflet/Leaflet
+ https://github.com/RocketChat/Rocket.Chat
+ https://github.com/photonstorm/phaser
+ https://github.com/quilljs/quill
+ https://github.com/jashkenas/backbone
+ https://github.com/preactjs/preact
+ https://github.com/tastejs/todomvc
+ https://github.com/caolan/async
+ https://github.com/vuejs/vue-cli
+ https://github.com/react-boilerplate/react-boilerplate
+ https://github.com/aosabook/500lines
+ https://github.com/carbon-app/carbon
+ https://github.com/Marak/faker.js
+ https://github.com/jashkenas/underscore
+ https://github.com/lerna/lerna
+ https://github.com/nolimits4web/swiper
+ https://github.com/vuejs/vuex
+ https://github.com/request/request
+ https://github.com/select2/select2
+ https://github.com/Modernizr/Modernizr
+ https://github.com/facebook/draft-js
+ https://github.com/rollup/rollup
+ https://github.com/jlmakes/scrollreveal
+ https://github.com/tj/commander.js
+ https://github.com/chenglou/react-motion
+ https://github.com/swagger-api/swagger-ui
+ https://github.com/bilibili/flv.js
+ https://github.com/segmentio/nightmare
+ https://github.com/laurent22/joplin
+ https://github.com/react-bootstrap/react-bootstrap
+ https://github.com/sampotts/plyr
+ https://github.com/avajs/ava
+ https://github.com/immerjs/immer
+ https://github.com/jorgebucaran/hyperapp
+ https://github.com/jaredhanson/passport
+ https://github.com/lovell/sharp
+ https://github.com/localForage/localForage
+ https://github.com/Popmotion/popmotion
+ https://github.com/vuejs/vuepress
\ No newline at end of file
--- a/repositories.txt → repositories/python.txt
View file @4f96aea
+++ b/repositories.txt → repositories/python.txt
View file @4f96aea
--- a/tests/javascript/added.diff 0 → 100644
View file @4f96aea
+++ b/tests/javascript/added.diff 0 → 100644
View file @4f96aea
+ diff --git a/function.js b/function.js
+ new file mode 100644
+ index 0000000..ba89d9a
+ --- /dev/null
+ +++ b/function.js
+ @@ -0,0 +1,6 @@
+ +function getIntoAnArgument() {
+ +    var args = arguments.slice();
+ +    args.forEach(function(arg) {
+ +        console.log(arg);
+ +    });
+ +}
+ \ No newline at end of file
--- a/tests/javascript/fixed.diff 0 → 100644
View file @4f96aea
+++ b/tests/javascript/fixed.diff 0 → 100644
View file @4f96aea
+ diff --git a/function.js b/function.js
+ index ba89d9a..d440734 100644
+ --- a/function.js
+ +++ b/function.js
+ @@ -1,6 +1,3 @@
+ -function getIntoAnArgument() {
+ -    var args = arguments.slice();
+ -    args.forEach(function(arg) {
+ -        console.log(arg);
+ -    });
+ +function getIntoAnArgument(...args) {
+ +    args.forEach(arg => console.log(arg));
+  }
+ \ No newline at end of file
--- a/tests/added.diff → tests/python/added.diff
View file @4f96aea
+++ b/tests/added.diff → tests/python/added.diff
View file @4f96aea
--- a/tests/fixed.diff → tests/python/fixed.diff
View file @4f96aea
+++ b/tests/fixed.diff → tests/python/fixed.diff
View file @4f96aea
--- a/tests/test_suite.py
View file @4f96aea
+++ b/tests/test_suite.py
View file @4f96aea
@@ -65,10 +65,6 @@ class CitiesTestCase(unittest.TestCase):
             )
         )
         self.assertEqual(response.status_code, 200)
-         self.assertEqual(
-             json.loads(response.text),
-             {'idx': 0, 'message': ['Test method .']}
-         )
 
     def test_added(self):
         response = requests.post(
@@ -83,10 +79,6 @@ class CitiesTestCase(unittest.TestCase):
             )
         )
         self.assertEqual(response.status_code, 200)
-         self.assertEqual(
-             json.loads(response.text),
-             {'idx': 0, 'message': ['Fix typo']}
-         )
 
 
 def suite():
--- a/weight/added/.keep → weights/python/added/.keep
View file @4f96aea
+++ b/weight/added/.keep → weights/python/added/.keep
View file @4f96aea
--- a/weight/diff/.keep → weights/python/diff/.keep
View file @4f96aea
+++ b/weight/diff/.keep → weights/python/diff/.keep
View file @4f96aea