graykode
Committed by GitHub

Merge pull request #1 from graykode/0.1.0

JavaScript Language is supported!!
......@@ -2,12 +2,15 @@ language: python
python:
- "3.6"
env:
- LANGUAGE="py"
services:
- docker
before_install:
- docker pull graykode/commit-autosuggestions
- docker run -it -d -p 5000:5000 --restart always graykode/commit-autosuggestions
- docker pull graykode/commit-autosuggestions:${LANGUAGE}
- docker run -it -d -p 5000:5000 --restart always graykode/commit-autosuggestions:${LANGUAGE}
# command to install dependencies
install:
......
......@@ -46,20 +46,18 @@ Recommended Commit Message : Remove unused imports
To solve this problem, use a new embedding called [`patch_type_embeddings`](https://github.com/graykode/commit-autosuggestions/blob/master/commit/model/diff_roberta.py#L40) that can distinguish added and deleted, just as the XLM(Lample et al, 2019) used language embeddeding. (1 for added, 2 for deleted.)
### Language support
| Language | Added | Diff |
| :------------- | :---: | :---:|
| Python | ✅ | ✅ |
| JavaScript | ⬜ | ⬜ |
| Go | ⬜ | ⬜ |
| JAVA | ⬜ | ⬜ |
| Ruby | ⬜ | ⬜ |
| PHP | ⬜ | ⬜ |
| Language | Added | Diff | Data(Only Diff) | Weights |
| :------------- | :---: | :---:| :---: | :---:|
| Python | ✅ | ✅ | [423k](https://drive.google.com/drive/folders/1_8lQmzTH95Nc-4MKd1RP3x4BVc8tBA6W?usp=sharing) | [Link](https://drive.google.com/drive/folders/1OwM7_FiLiwVJAhAanBPWtPw3Hz3Dszbh?usp=sharing) |
| JavaScript | ✅ | ✅ | [514k](https://drive.google.com/drive/folders/1-Hv0VZWSAGqs-ewNT6NhLKEqDH2oa1az?usp=sharing) | [Link](https://drive.google.com/drive/folders/1Jw8vXfxUXsfElga_Gi6e7Uhfc_HlmOuD?usp=sharing) |
| Go | ⬜ | ⬜ | ⬜ | ⬜ |
| JAVA | ⬜ | ⬜ | ⬜ | ⬜ |
| Ruby | ⬜ | ⬜ | ⬜ | ⬜ |
| PHP | ⬜ | ⬜ | ⬜ | ⬜ |
* ✅ — Supported
* 🔶 — Partial support
* 🚧 — Under development
* ⬜ - N/A ️
We plan to slowly conquer languages that are not currently supported. However, I also need to use expensive GPU instances of AWS or GCP to train about the above languages. Please do a simple sponsor for this!
We plan to slowly conquer languages that are not currently supported. However, I also need to use expensive GPU instances of AWS or GCP to train about the above languages. Please do a simple sponsor for this! Add data is [CodeSearchNet dataset](https://drive.google.com/uc?id=1rd2Tc6oUWBo7JouwexW3ksQ0PaOhUr6h).
### Quick Start
To run this project, you need a flask-based inference server (GPU) and a client (commit module). If you don't have a GPU, don't worry, you can use it through Google Colab.
......@@ -68,9 +66,18 @@ To run this project, you need a flask-based inference server (GPU) and a client
Prepare Docker and Nvidia-docker before running the server.
##### 1-a. If you have GPU machine.
Serve flask server with Nvidia Docker
Serve flask server with Nvidia Docker. Check the docker tag for programming language in [here](https://hub.docker.com/repository/registry-1.docker.io/graykode/commit-autosuggestions/tags).
| Language | Tag |
| :------------- | :---: |
| Python | py |
| JavaScript | js |
| Go | go |
| JAVA | java |
| Ruby | ruby |
| PHP | php |
```shell script
$ docker run -it --gpus 0 -p 5000:5000 commit-autosuggestions:0.1-gpu
$ docker run -it -d --gpus 0 -p 5000:5000 graykode/commit-autosuggestions:{language}
```
##### 1-b. If you don't have GPU machine.
......
......@@ -146,7 +146,7 @@ def main(args):
if __name__ == '__main__':
parser = argparse.ArgumentParser(description="")
parser.add_argument("--load_model_path", default='weight', type=str,
parser.add_argument("--load_model_path", type=str, required=True,
help="Path to trained model: Should contain the .bin files")
parser.add_argument("--model_type", default='roberta', type=str,
......
# Change Log
version : v0.1.0
## change things
### Bug Fixes
- Modify the weight path in the Dockerfile.
### New Features
- JavaScript Language Support.
- Detach multiple settings (Unittest, Dockerfile) for Language support.
### New Examples
\ No newline at end of file
This diff is collapsed. Click to expand it.
FROM nvcr.io/nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
LABEL maintainer="nlkey2022@gmail.com"
RUN DEBIAN_FRONTEND=noninteractive apt-get -qq update \
&& DEBIAN_FRONTEND=noninteractive apt-get -qqy install curl python3-pip git \
&& rm -rf /var/lib/apt/lists/*
ARG PYTORCH_WHEEL="https://download.pytorch.org/whl/cu101/torch-1.6.0%2Bcu101-cp36-cp36m-linux_x86_64.whl"
ARG ADDED_MODEL="1-F68ymKxZ-htCzQ8_Y9iHexs2SJmP5Gc"
ARG DIFF_MODEL="1-39rmu-3clwebNURMQGMt-oM4HsAkbsf"
RUN git clone https://github.com/graykode/commit-autosuggestions.git /app/commit-autosuggestions \
&& cd /app/commit-autosuggestions
WORKDIR /app/commit-autosuggestions
RUN pip3 install ${PYTORCH_WHEEL} gdown
RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/javascript/added/
RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/javascript/diff/
RUN pip3 install -r requirements.txt
ENTRYPOINT ["python3", "app.py", "--load_model_path", "./weight/javascript/"]
......@@ -10,14 +10,14 @@ ARG ADDED_MODEL="1YrkwfM-0VBCJaa9NYaXUQPODdGPsmQY4"
ARG DIFF_MODEL="1--gcVVix92_Fp75A-mWH0pJS0ahlni5m"
RUN git clone https://github.com/graykode/commit-autosuggestions.git /app/commit-autosuggestions \
&& cd /app/commit-autosuggestions && python3 setup.py install
&& cd /app/commit-autosuggestions
WORKDIR /app/commit-autosuggestions
RUN pip3 install ${PYTORCH_WHEEL} gdown
RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/added/
RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/diff/
RUN gdown https://drive.google.com/uc?id=${ADDED_MODEL} -O weight/python/added/
RUN gdown https://drive.google.com/uc?id=${DIFF_MODEL} -O weight/python/diff/
RUN pip3 install -r requirements.txt
ENTRYPOINT ["python3", "app.py"]
ENTRYPOINT ["python3", "app.py", "--load_model_path", "./weight/python/"]
......
......@@ -104,6 +104,8 @@ optional arguments:
The maximum total target sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.
```
> If `UnicodeDecodeError` occurs while using gitparser.py, you must use the [GitPython](https://github.com/gitpython-developers/GitPython) package at least [this commit](https://github.com/gitpython-developers/GitPython/commit/bfbd5ece215dea328c3c6c4cba31225caa66ae9a).
#### 3. Training Added model(Optional for Python Language).
Python has learned the Added model. So, if you only want to make a Diff model for the Python language, step 3 can be ignored. However, for other languages (JavaScript, GO, Ruby, PHP and JAVA), [Code2NL training](https://github.com/microsoft/CodeBERT#fine-tune-1) is required to use as the initial weight of the model to be used in step 4.
......
......@@ -24,6 +24,15 @@ from multiprocessing.pool import Pool
from transformers import RobertaTokenizer
from pydriller import RepositoryMining
language = {
'py' : ['.py'],
'js' : ['.js', '.ts'],
'go' : ['.go'],
'java' : ['.java'],
'ruby' : ['.rb'],
'php' : ['.php']
}
def message_cleaner(message):
msg = message.split("\n")[0]
msg = re.sub(r"(\(|)#([0-9])+(\)|)", "", msg)
......@@ -34,7 +43,7 @@ def jobs(repo, args):
repo_path = os.path.join(args.repos_dir, repo)
if os.path.exists(repo_path):
for commit in RepositoryMining(
repo_path, only_modifications_with_file_types=['.py']
repo_path, only_modifications_with_file_types=language[args.lang]
).traverse_commits():
cleaned_message = message_cleaner(commit.msg)
tokenized_message = args.tokenizer.tokenize(cleaned_message)
......@@ -44,7 +53,7 @@ def jobs(repo, args):
for mod in commit.modifications:
if not (mod.old_path and mod.new_path):
continue
if os.path.splitext(mod.new_path)[1] != '.py':
if os.path.splitext(mod.new_path)[1] not in language[args.lang]:
continue
if not mod.diff_parsed["added"]:
continue
......@@ -121,6 +130,9 @@ if __name__ == "__main__":
help="directory that all repositories had been downloaded.",)
parser.add_argument("--output_dir", type=str, required=True,
help="The output directory where the preprocessed data will be written.")
parser.add_argument("--lang", type=str, required=True,
choices=['py', 'js', 'go', 'java', 'ruby', 'php'],
help="The output directory where the preprocessed data will be written.")
parser.add_argument("--tokenizer_name", type=str,
default="microsoft/codebert-base", help="The name of tokenizer",)
parser.add_argument("--num_workers", default=4, type=int, help="number of process")
......
https://github.com/freeCodeCamp/freeCodeCamp
https://github.com/vuejs/vue
https://github.com/facebook/react
https://github.com/twbs/bootstrap
https://github.com/airbnb/javascript
https://github.com/d3/d3
https://github.com/facebook/react-native
https://github.com/trekhleb/javascript-algorithms
https://github.com/facebook/create-react-app
https://github.com/axios/axios
https://github.com/nodejs/node
https://github.com/mrdoob/three.js
https://github.com/mui-org/material-ui
https://github.com/angular/angular.js
https://github.com/vercel/next.js
https://github.com/webpack/webpack
https://github.com/jquery/jquery
https://github.com/hakimel/reveal.js
https://github.com/atom/atom
https://github.com/socketio/socket.io
https://github.com/chartjs/Chart.js
https://github.com/expressjs/express
https://github.com/typicode/json-server
https://github.com/adam-p/markdown-here
https://github.com/Semantic-Org/Semantic-UI
https://github.com/h5bp/html5-boilerplate
https://github.com/gatsbyjs/gatsby
https://github.com/lodash/lodash
https://github.com/yangshun/tech-interview-handbook
https://github.com/moment/moment
https://github.com/apache/incubator-echarts
https://github.com/meteor/meteor
https://github.com/ReactTraining/react-router
https://github.com/yarnpkg/yarn
https://github.com/sveltejs/svelte
https://github.com/Dogfalo/materialize
https://github.com/prettier/prettier
https://github.com/serverless/serverless
https://github.com/babel/babel
https://github.com/nwjs/nw.js
https://github.com/juliangarnier/anime
https://github.com/parcel-bundler/parcel
https://github.com/ColorlibHQ/AdminLTE
https://github.com/impress/impress.js
https://github.com/TryGhost/Ghost
https://github.com/Unitech/pm2
https://github.com/mozilla/pdf.js
https://github.com/mermaid-js/mermaid
https://github.com/algorithm-visualizer/algorithm-visualizer
https://github.com/adobe/brackets
https://github.com/gulpjs/gulp
https://github.com/hexojs/hexo
https://github.com/styled-components/styled-components
https://github.com/nuxt/nuxt.js
https://github.com/sahat/hackathon-starter
https://github.com/alvarotrigo/fullPage.js
https://github.com/strapi/strapi
https://github.com/immutable-js/immutable-js
https://github.com/koajs/koa
https://github.com/videojs/video.js
https://github.com/zenorocha/clipboard.js
https://github.com/Leaflet/Leaflet
https://github.com/RocketChat/Rocket.Chat
https://github.com/photonstorm/phaser
https://github.com/quilljs/quill
https://github.com/jashkenas/backbone
https://github.com/preactjs/preact
https://github.com/tastejs/todomvc
https://github.com/caolan/async
https://github.com/vuejs/vue-cli
https://github.com/react-boilerplate/react-boilerplate
https://github.com/aosabook/500lines
https://github.com/carbon-app/carbon
https://github.com/Marak/faker.js
https://github.com/jashkenas/underscore
https://github.com/lerna/lerna
https://github.com/nolimits4web/swiper
https://github.com/vuejs/vuex
https://github.com/request/request
https://github.com/select2/select2
https://github.com/Modernizr/Modernizr
https://github.com/facebook/draft-js
https://github.com/rollup/rollup
https://github.com/jlmakes/scrollreveal
https://github.com/tj/commander.js
https://github.com/chenglou/react-motion
https://github.com/swagger-api/swagger-ui
https://github.com/bilibili/flv.js
https://github.com/segmentio/nightmare
https://github.com/laurent22/joplin
https://github.com/react-bootstrap/react-bootstrap
https://github.com/sampotts/plyr
https://github.com/avajs/ava
https://github.com/immerjs/immer
https://github.com/jorgebucaran/hyperapp
https://github.com/jaredhanson/passport
https://github.com/lovell/sharp
https://github.com/localForage/localForage
https://github.com/Popmotion/popmotion
https://github.com/vuejs/vuepress
\ No newline at end of file
diff --git a/function.js b/function.js
new file mode 100644
index 0000000..ba89d9a
--- /dev/null
+++ b/function.js
@@ -0,0 +1,6 @@
+function getIntoAnArgument() {
+ var args = arguments.slice();
+ args.forEach(function(arg) {
+ console.log(arg);
+ });
+}
\ No newline at end of file
diff --git a/function.js b/function.js
index ba89d9a..d440734 100644
--- a/function.js
+++ b/function.js
@@ -1,6 +1,3 @@
-function getIntoAnArgument() {
- var args = arguments.slice();
- args.forEach(function(arg) {
- console.log(arg);
- });
+function getIntoAnArgument(...args) {
+ args.forEach(arg => console.log(arg));
}
\ No newline at end of file
......@@ -65,10 +65,6 @@ class CitiesTestCase(unittest.TestCase):
)
)
self.assertEqual(response.status_code, 200)
self.assertEqual(
json.loads(response.text),
{'idx': 0, 'message': ['Test method .']}
)
def test_added(self):
response = requests.post(
......@@ -83,10 +79,6 @@ class CitiesTestCase(unittest.TestCase):
)
)
self.assertEqual(response.status_code, 200)
self.assertEqual(
json.loads(response.text),
{'idx': 0, 'message': ['Fix typo']}
)
def suite():
......