불필요한 파일 삭제

김건
Commit 9564bb439865a9545661238f0ad062cd3ab7f8ac 9564bb43 1 parent 21f26f7f
Showing 13 changed files with 0 additions and 1063 deletions
JPype1-0.7.0-cp38-cp38-win_amd64.whl
Youtube/.gitignore
Youtube/LICENSE
Youtube/README.md
Youtube/downloader.py
Youtube/main.py
Youtube/requirements.txt
naverNews/naverNews.md
naverNews/naverNews_crawling.py
twitter/now.md
twitter/readme.md
twitter/twitter.py
youtube.md
--- a/JPype1-0.7.0-cp38-cp38-win_amd64.whl deleted 100644 → 0
View file @21f26f7
+++ b/JPype1-0.7.0-cp38-cp38-win_amd64.whl deleted 100644 → 0
View file @21f26f7
--- a/Youtube/.gitignore deleted 100644 → 0
View file @21f26f7
+++ b/Youtube/.gitignore deleted 100644 → 0
View file @21f26f7
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-
-# C extensions
-*.so
-
-# Distribution / packaging
-.Python
-env/
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-*.egg-info/
-.installed.cfg
-*.egg
-
-# PyInstaller
-#  Usually these files are written by a python script from a template
-#  before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*,cover
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-target/
--- a/Youtube/LICENSE deleted 100644 → 0
View file @21f26f7
+++ b/Youtube/LICENSE deleted 100644 → 0
View file @21f26f7
-The MIT License (MIT)
-
-Copyright (c) 2015 Egbert Bouman
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
-
--- a/Youtube/README.md deleted 100644 → 0
View file @21f26f7
+++ b/Youtube/README.md deleted 100644 → 0
View file @21f26f7
-# youtube-comment-downloader
-Simple script for downloading Youtube comments without using the Youtube API. The output is in line delimited JSON.
-
-### Dependencies
-* Python 2.7+
-* requests
-* lxml
-* cssselect
-
-The python packages can be installed with
-
-    pip install requests
-    pip install lxml
-    pip install cssselect
-
-### Usage
-```
-usage: downloader.py [--help] [--youtubeid YOUTUBEID] [--output OUTPUT]
-
-Download Youtube comments without using the Youtube API
-
-optional arguments:
-  --help, -h            Show this help message and exit
-  --youtubeid YOUTUBEID, -y YOUTUBEID
-                        ID of Youtube video for which to download the comments
-  --output OUTPUT, -o OUTPUT
-                        Output filename (output format is line delimited JSON)
-```
--- a/Youtube/downloader.py deleted 100644 → 0
View file @21f26f7
+++ b/Youtube/downloader.py deleted 100644 → 0
View file @21f26f7
-#!/usr/bin/env python
-
-from __future__ import print_function
-import sys
-import os
-import time
-import json
-import requests
-import argparse
-import lxml.html
-import io
-from urllib.parse import urlparse, parse_qs
-from lxml.cssselect import CSSSelector
-
-YOUTUBE_COMMENTS_URL = 'https://www.youtube.com/all_comments?v={youtube_id}'
-YOUTUBE_COMMENTS_AJAX_URL = 'https://www.youtube.com/comment_ajax'
-
-USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'
-
-
-def find_value(html, key, num_chars=2):
-    pos_begin = html.find(key) + len(key) + num_chars
-    pos_end = html.find('"', pos_begin)
-    return html[pos_begin: pos_end]
-
-
-def extract_comments(html):
-    tree = lxml.html.fromstring(html)
-    item_sel = CSSSelector('.comment-item')
-    text_sel = CSSSelector('.comment-text-content')
-    time_sel = CSSSelector('.time')
-    author_sel = CSSSelector('.user-name')
-
-    for item in item_sel(tree):
-        yield {'cid': item.get('data-cid'),
-               'text': text_sel(item)[0].text_content(),
-               'time': time_sel(item)[0].text_content().strip(),
-               'author': author_sel(item)[0].text_content()}
-
-
-def extract_reply_cids(html):
-    tree = lxml.html.fromstring(html)
-    sel = CSSSelector('.comment-replies-header > .load-comments')
-    return [i.get('data-cid') for i in sel(tree)]
-
-
-def ajax_request(session, url, params, data, retries=10, sleep=20):
-    for _ in range(retries):
-        response = session.post(url, params=params, data=data)
-        if response.status_code == 200:
-            response_dict = json.loads(response.text)
-            return response_dict.get('page_token', None), response_dict['html_content']
-        else:
-            time.sleep(sleep)
-
-
-def download_comments(youtube_id, sleep=1):
-    session = requests.Session()
-    session.headers['User-Agent'] = USER_AGENT
-    # Get Youtube page with initial comments
-    response = session.get(YOUTUBE_COMMENTS_URL.format(youtube_id=youtube_id))
-    html = response.text
-    reply_cids = extract_reply_cids(html)
-
-    ret_cids = []
-    for comment in extract_comments(html):
-        ret_cids.append(comment['cid'])
-        yield comment
-    page_token = find_value(html, 'data-token')
-    session_token = find_value(html, 'XSRF_TOKEN', 4)
-    first_iteration = True
-
-    # Get remaining comments (the same as pressing the 'Show more' button)
-    while page_token:
-        data = {'video_id': youtube_id,
-                'session_token': session_token}
-
-        params = {'action_load_comments': 1,
-                  'order_by_time': True,
-                  'filter': youtube_id}
-
-        if first_iteration:
-            params['order_menu'] = True
-        else:
-            data['page_token'] = page_token
-
-        response = ajax_request(session, YOUTUBE_COMMENTS_AJAX_URL, params, data)
-        if not response:
-            break
-
-        page_token, html = response
-
-        reply_cids += extract_reply_cids(html)
-        for comment in extract_comments(html):
-            if comment['cid'] not in ret_cids:
-                ret_cids.append(comment['cid'])
-                yield comment
-
-        first_iteration = False
-        time.sleep(sleep)
-    # Get replies (the same as pressing the 'View all X replies' link)
-    for cid in reply_cids:
-        data = {'comment_id': cid,
-                'video_id': youtube_id,
-                'can_reply': 1,
-                'session_token': session_token}
-        params = {'action_load_replies': 1,
-                  'order_by_time': True,
-                  'filter': youtube_id,
-                  'tab': 'inbox'}
-        response = ajax_request(session, YOUTUBE_COMMENTS_AJAX_URL, params, data)
-        if not response:
-            break
-
-        _, html = response
-
-        for comment in extract_comments(html):
-            if comment['cid'] not in ret_cids:
-                ret_cids.append(comment['cid'])
-                yield comment
-        time.sleep(sleep)
-
-## input video 값 parsing
-def video_id(value):
-    query = urlparse(value)
-    if query.hostname == 'youtu.be':
-        return query.path[1:]
-    if query.hostname in ('www.youtube.com', 'youtube.com'):
-        if query.path == '/watch':
-            p = parse_qs(query.query)
-            return p['v'][0]
-        if query.path[:7] == '/embed/':
-            return query.path.split('/')[2]
-        if query.path[:3] == '/v/':
-            return query.path.split('/')[2]
-    # fail?
-    return None
-
-
-def main():
-
-    #parser = argparse.ArgumentParser(add_help=False, description=('Download Youtube comments without using the Youtube API'))
-    #parser.add_argument('--help', '-h', action='help', default=argparse.SUPPRESS, help='Show this help message and exit')
-    #parser.add_argument('--youtubeid', '-y', help='ID of Youtube video for which to download the comments')
-    #parser.add_argument('--output', '-o', help='Output filename (output format is line delimited JSON)')
-    #parser.add_argument('--limit', '-l', type=int, help='Limit the number of comments')
-    Youtube_id1 = input('Youtube_ID 입력 :')
-    ## Cutting Link를 받고 id만 딸 수 있도록
-    Youtube_id1 = video_id(Youtube_id1)
-    youtube_id = Youtube_id1
-    try:
-        # args = parser.parse_args(argv)
-
-        #youtube_id = args.youtubeid
-        #output = args.output
-        #limit = args.limit
-        result_List = []
-    ## input 값을 받고 값에 할당
-
-    ## Limit에 빈 값이 들어갈 경우 Default 값으로 100을 넣게 하였음
-        if not youtube_id :
-            #parser.print_usage()
-            #raise ValueError('you need to specify a Youtube ID and an output filename')
-            raise ValueError('올바른 입력 값을 입력하세요')
-
-        print('Downloading Youtube comments for video:', youtube_id)
-        Number = input(' 저장 - 0 저장 안함-  1 : ')
-        if Number == '0' :
-            Output1 = input('결과를 받을 파일 입력 :')
-            Limit1 = input('제한 갯수 입력 : ')
-            if Limit1 == '' :
-                Limit1 = 100
-                Limit1 = int(Limit1)
-            limit = int(Limit1)
-
-            output = Output1
-                ##### argument로 받지 않고 input으로 받기 위한 것
-            with io.open(output, 'w', encoding='utf8') as fp:
-                for comment in download_comments(youtube_id):
-                    comment_json = json.dumps(comment, ensure_ascii=False)
-                    print(comment_json.decode('utf-8') if isinstance(comment_json, bytes) else comment_json, file=fp)
-                    count += 1
-                    sys.stdout.flush()
-                    if limit and count >= limit:
-                        print('Downloaded {} comment(s)\r'.format(count))
-                        print('\nDone!')
-                        break
-
-        else :
-            count = 0
-            i = 0
-            limit = 40
-            for comment in download_comments(youtube_id):
-                dic = {}
-                dic['cid'] = comment['cid']
-                dic['text'] = comment['text']
-                dic['time'] = comment['time']
-                dic['author'] = comment['author']
-                result_List.append(dic)
-                count += 1
-                i += 1
-                if limit  == count :
-                    print(' Comment Thread 생성 완료')
-                    print ('\n\n\n\n\n\n\n')
-                    break
-        return result_List
-        #goto_Menu(result_List)
-
-
-
-    except Exception as e:
-        print('Error:', str(e))
-        sys.exit(1)
-
-
-if __name__ == "__main__":
-    main()
--- a/Youtube/main.py deleted 100644 → 0
View file @21f26f7
+++ b/Youtube/main.py deleted 100644 → 0
View file @21f26f7
-import downloader
-from time import sleep
-from konlpy.tag import Twitter
-from collections import Counter
-from matplotlib import rc
-import matplotlib.pyplot as plt
-from matplotlib import font_manager as fm
-import pytagcloud
-import operator
-def get_tags (Comment_List) :
-
-    okja = []
-    for temp in Comment_List :
-        okja.append(temp['text'])
-    twitter = Twitter()
-    sentence_tag  =[]
-    for sentence in okja:
-        morph = twitter.pos(sentence)
-        sentence_tag.append(morph)
-        print(morph)
-        print('-'*30)
-    print(sentence_tag)
-    print(len(sentence_tag))
-    print('\n'*3)
-
-    noun_adj_list = []
-    for sentence1 in sentence_tag:
-        for word,tag in sentence1:
-             if len(word) >=2 and tag  == 'Noun':
-                noun_adj_list.append(word)
-    counts = Counter(noun_adj_list)
-    print(' 가장 많이 등장한 10개의 키워드. \n')
-    print(counts.most_common(10))
-    tags2 = counts.most_common(10)
-    taglist = pytagcloud.make_tags(tags2,maxsize=80)
-    pytagcloud.create_tag_image(taglist,'wordcloud.jpg',size =(900,600),fontname ='Nanum Gothic', rectangular = False)
-
-def print_result(Comment_List) :
-    for var in Comment_List :
-        print(var)
-    print('******* 검색 완료 *******')
-    print('\n\n\n')
-
-def search_by_author(Comment_List,author_name) :
-    result_List = []
-
-    for var in Comment_List :
-        if (var['author'] == author_name) :
-            result_List.append(var)
-
-    return result_List
-def search_by_keyword(Comment_List,keyword) :
-        result_List = []
-        for var in Comment_List :
-            print(var['text'])
-            if ( keyword in var['text']) :
-                result_List.append(var)
-
-        return result_List
-def search_by_time(Comment_List,Time_input) :
-    result_List = []
-    for var in Comment_List :
-        if(var['time'] == Time_input) :
-            result_List.append(var)
-    return result_List
-
-def make_time_chart (Comment_List) :
-    result_List = []
-    save_List = []
-    day_dict = {}
-    month_dict = {}
-    year_dict = {}
-    hour_dict = {}
-    minute_dict = {}
-    week_dict = {}
-    for var in Comment_List :
-        result_List.append(var['time'])
-    for i in range(len(result_List)) :
-        print(result_List[i] + ' ')
-    print('\n\n\n\n')
-    temp_List = list(set(result_List))
-    for i in range(len(temp_List)) :
-        print(temp_List[i] + ' ')
-    print('\n\n\n\n')
-    for i in range (len(temp_List)) :
-        result_dict = {}
-        a = result_List.count(temp_List[i])
-        result_dict[temp_List[i]] = a
-        save_List.append(result_dict)
-
-    for i in range (len(save_List)):
-        num = ''
-        data = 0
-        for j in save_List[i] :
-            num = j
-        for k in save_List[i].values() :
-            data = k
-        if num.find('개월') >= 0 :
-            month_dict[num] = k
-        elif num.find('일') >= 0 :
-            day_dict[num] = k
-        elif num.find('년') >= 0 :
-            year_dict[num] = k
-        elif num.find('시간') >= 0 :
-            hour_dict[num] = k
-        elif num.find('주') >= 0 :
-            week_dict[num] = k
-        elif num.find('분') >= 0 :
-            minute_dict[num] = k
-    year_data = sorted(year_dict.items(), key=operator.itemgetter(0))
-    month_data = sorted(month_dict.items(), key=operator.itemgetter(0))
-    week_data = sorted(week_dict.items(), key=operator.itemgetter(0))
-    day_data = sorted(day_dict.items(), key=operator.itemgetter(0))
-    hour_data = sorted(hour_dict.items(), key=operator.itemgetter(0))
-    minute_data = sorted(minute_dict.items(), key=operator.itemgetter(0))
-    #print(month_data)
-    #print(week_data)
-    #print(day_data)
-    make_chart(year_data,month_data,week_data,day_data,hour_data,minute_data)
-
-def make_chart(year_data,month_data,week_data,day_data,hour_data,minute_data) :
-    temp_list =  [year_data,month_data,week_data,day_data,hour_data,minute_data]
-    x_list = []
-    y_list = []
-    print(temp_list)
-    for var1 in temp_list :
-        for var2 in var1 :
-            if(var2[0].find('년')>=0):
-                temp1 = var2[0][0] + 'years'
-                temp2 = int(var2[1])
-                x_list.append(temp1)
-                y_list.append(temp2)
-            elif(var2[0].find('개월')>=0):
-                temp1 = var2[0][0] + 'months'
-                temp2 = int(var2[1])
-                x_list.append(temp1)
-                y_list.append(temp2)
-            elif(var2[0].find('주')>=0):
-                temp1 = var2[0][0] + 'weeks'
-                temp2 = int(var2[1])
-                x_list.append(temp1)
-                y_list.append(temp2)
-            elif(var2[0].find('일')>=0):
-                temp1 = var2[0][0] + 'days'
-                temp2 = int(var2[1])
-                x_list.append(temp1)
-                y_list.append(temp2)
-            elif(var2[0].find('시간')>=0):
-                temp1 = var2[0][0] + 'hours'
-                temp2 = int(var2[1])
-                x_list.append(temp1)
-                y_list.append(temp2)
-            else:
-                temp1 = var2[0][0] + 'minutes'
-                temp2 = int(var2[1])
-                x_list.append(temp1)
-                y_list.append(temp2)
-    print(x_list)
-    plt.bar(x_list,y_list,width = 0.5 , color = "blue")
-    # plt.show() -> 출력
-    plt.savefig('chart.png',dpi=300)
-    # plt.savefig('chart.png', dpi=300)
-
-def call_main ():
-    print(' Comment Thread 생성중 \n')
-
-    sleep(1)
-    print(' **************************************************************')
-    print(' **************************************************************')
-    print(' **************************************************************')
-    print(' **************** 생성 완료 정보를 입력하세요. ****************  ')
-    print(' **************************************************************')
-    print(' **************************************************************')
-    print(' **************************************************************')
-    a = downloader.main()
-
-    return a
-
-if __name__ == "__main__":
-    CommentList = call_main()
-    make_time_chart(CommentList)
-    ##author_results = search_by_author(CommentList,'광고제거기')
-    ##text_resutls = search_by_keyword(CommentList,'지현')
-    ##get_tags(CommentList)
-    ##print_result(author_results)
-    ##print_result(text_resutls)
--- a/Youtube/requirements.txt deleted 100644 → 0
View file @21f26f7
+++ b/Youtube/requirements.txt deleted 100644 → 0
View file @21f26f7
-requests
-beautifulsoup4
-lxml
-cssselect
-### ũѸ
-pygame
-pytagcloud
-### wordcloud 
-Jpye1
-### Ű м
-python -m pip install -U matplotlib==3.2.0rc1
--- a/naverNews/naverNews.md deleted 100644 → 0
View file @21f26f7
+++ b/naverNews/naverNews.md deleted 100644 → 0
View file @21f26f7
-1. Data 받아오기
-    1) selenuim을 이용하여 웹페이지에서 데이터를 검색
-    2) 원하는 URL 입력받는다
-    3) headless하게 구현하기 위해 chrome option 적용하여 driver 생성
-    4) naverNews는 댓글 영역 하단 부 '더보기'를 지속적으로 눌러줘야하므로
-       driver의 find_element_by_css_selector함수로 해당 class인 
-       u_cbox_btn_more을 페이지가 끝날 때까지 돌림
-    5) 위의 과정에서 얻은 페이지 소스를 beautifulSoup을 이용하여, find_all을 통해 {사용자ID, 댓글, 작성시간}의 데이터를 각각 raw하게 뽑음. (naverNews의 제한적인 특징으로 사용자ID 뒤 4자리는 비공개처리됨)
-    
-2. 사용할 DataSet으로 가공
-    1) 리스트 형태로 각각 nicknames(사용자ID), comments(댓글), times(작성시간)을 뽑아냄
-    2) 세 리스트에서 짝을 이루는 쌍을 dictionary형태로 {사용자ID, 댓글, 작성시간} 다음과 같이 저장
-    3) 저장된 dictionary list(info_dic)을 최종 결과 리스트인 naverNewsList에 저장한다.
-    
-3. 함수 구현
-    1) KEYWORD 기반 검색 기능
-    2) 가장 자주 나온 단어 검색 기능
-    3) ID 기반 검색 기능
-    4) 시간 대별 검색 기능
-    등 여러 함수 구현 예정
-    
-=> 수정사항
-    
-    data를 get하여 정제하는 파일을 모듈로 분리해 내어 list형태로 저장된 데이터셋을 반환하여
-    main 에서 사용할 수 있도록 한다. 이 후 main에서 리스트를 받아와 url을 입력받아 데이터를
-    받아오는 방식으로 사용한다. 이 후, keyword기반, id기반, 시간대 기반 검색 함수를 구현하였고
-    시간대별 검색 함수의 기능 보강과 가장 자주 나온 단어 검색 기능을 추가 구현할 예정이다.
-    
-* 4차 수정사항
-
-    기존파일의 분리 관리 시, import관련 오류 문제 해결 완료(하나의 파일로 관리) 
-    사용자 UI의 틀을 구축해놓았고, 곧바로 함수별 추가 세부 구현 예정
-    
-* 5차 수정사항
-
-    1) 네이버 댓글공간엑서 받아온 날짜 정보를 YYYY-MM-DD형식으로 바꿈. ('방금 전, 몇 분 전, 몇 시간 전, 몇 일 전'의 경우를 처리하기 위해 dateTime과 timeDelta 모듈을 활용하여 
-    현재 날짜를 기준으로 계산하여 YYYY-MM-DD로 저장될 수 있도록
-    코드 추가)
-    2) 시간대별로 (시작시간, 끝시간)을 입력하여 그 시간에 해당하는 기사를 출력해주는 함수 구현
-    
-    가장 자주 많이 나온 단어 검색과 MATPLOTLIB을 활용한 시각적 표현 구현 예정
-    
-* 6차 수정사항
-
-    konlpy를 활용한 명사 추출 및 단어 빈도수가 많으 순대로 사용자가 입력한 limit만큼 출력해주는 함수 구현 완료
\ No newline at end of file
--- a/naverNews/naverNews_crawling.py deleted 100644 → 0
View file @21f26f7
+++ b/naverNews/naverNews_crawling.py deleted 100644 → 0
View file @21f26f7
-from selenium import webdriver
-from selenium.common import exceptions
-from bs4 import BeautifulSoup
-from datetime import datetime, timedelta
-from konlpy.tag import Twitter
-from collections import Counter
-import time
-
-
-def getData(url):
-    ## chrome option걸기 (headless하게 웹 크롤링 수행하기 위해<웹페이지 안보이게 하기>)
-    options = webdriver.ChromeOptions()
-    #options.add_argument('headless')
-    #options.add_argument("disable-gpu")
-    #_url = "https://entertain.naver.com/ranking/comment/list?oid=144&aid=0000642175" # 크롤링할 URL
-    _url = url # 크롤링할 URL
-    webDriver = "C:\\Users\\user\\Desktop\\chromedriver_win32\\chromedriver.exe"  # 내 웹드라이버 위치
-    driver = webdriver.Chrome(webDriver,chrome_options=options)
-    #driver = webdriver.Chrome(webDriver)
-    driver.get(_url)
-    pageCnt = 0
-    driver.implicitly_wait(3) # 페이지가 다 로드 될때까지 기다리게함
-    try:
-        while True: # 댓글 페이지 끝날때까지 돌림
-            #driver의 find_element_by_css_selector함수로 '네이버 뉴스'의 댓글 '더보기' 버튼을 찾아서 계속 클릭해준다(끝까지)
-            driver.find_element_by_css_selector(".u_cbox_btn_more").click() 
-            pageCnt = pageCnt+1
-        
-    except exceptions.ElementNotVisibleException as e: # 페이지가 끝남
-        pass
-        
-    except Exception as e: # 다른 예외 발생시 확인
-        print(e)
-    
-    pageSource = driver.page_source # 페이지 소스를 따와서
-    result = BeautifulSoup(pageSource, "lxml") # 빠르게 뽑아오기 위해 lxml 사용
-
-    # nickname, text, time을 raw하게 뽑아온다
-    comments_raw = result.find_all("span", {"class" : "u_cbox_contents"})
-    nicknames_raw = result.find_all("span", {"class" : "u_cbox_nick"})
-    times_raw = result.find_all("span", {"class" : "u_cbox_date"})
-
-    # nickname, text, time 값 만을 뽑아내어 리스트로 정리한다
-    comments = [comment.text for comment in comments_raw]
-    nicknames = [nickname.text for nickname in nicknames_raw]
-    times = [time.text for time in times_raw]
-    
-    naverNewsList = []
-    
-    for i in range(len(comments)):
-        info_dic = {'userID' : nicknames[i], 'comment' : comments[i], 'time' : times[i]}
-        naverNewsList.append(info_dic)
-        
-    return naverNewsList
-    #driver.quit()
-    
-from time import sleep
-
-def print_cList(c_List) :
-    for item in c_List :
-        print(item)
-
-def search_by_author(c_List,user_ID) :
-        result_List = []
-        for item in c_List :
-           #print(item['userID'])
-            if ( user_ID in item['userID']) :
-                result_List.append(item)
-        return result_List
-
-def search_by_keyword(c_List,keyword) :
-        result_List = []
-        for item in c_List :
-            #print(item['comment'])
-            if ( keyword in item['comment']) :
-                result_List.append(item)
-        return result_List
-
-def refine_time(c_List): # 시간에서 몇일 전, 몇 분 전, 방금 전 등의 형태를 YYYY.MM.DD로 바꿔준다
-    now = datetime.now()
-    
-    for item in c_List:
-        if (item['time'].find('전') != -1): # ~~전이 있으면
-            if (item['time'].find('일 전') != -1): # ~일 전이라면
-                _day = -(int)(item['time'][0]) # 몇 일전인지에 대한 정수형 변수
-                tempTime = now + timedelta(days=_day)
-                item['time'] = str(tempTime)
-                item['time'] = item['time'][0:10]
-                continue
-            elif (item['time'].find('시간 전') != -1):
-                _index = item['time'].index('시')
-                _time = -(int)(item['time'][0:_index]) # 몇 시간 전인지에 대한 정수형 변수
-                tempTime = now + timedelta(hours = _time)
-                item['time'] = str(tempTime)
-                item['time'] = item['time'][0:10]
-                continue
-            elif (item['time'].find('분 전') != -1):
-                _index = item['time'].index('분')
-                _minute = -(int)(item['time'][0:_index]) # 몇 분 전인지에 대한 정수형 변수
-                tempTime = now + timedelta(minutes = _minute)
-                item['time'] = str(tempTime)
-                item['time'] = item['time'][0:10]
-                continue
-            elif (item['time'].find('방금 전') != -1):
-                tempTime = now
-                item['time'] = str(tempTime)
-                item['time'] = item['time'][0:10]
-                continue
-            else:
-                item['time'] = item['time'][0:10]
-                continue
-        
-        
-                
-            
-
-def search_by_time(c_List,startTime, endTime) : 
-    result_List = []
-    
-    startYear = int(startTime[0:4])
-    
-    if (int(startTime[5]) == 0): # 한자리의 월일 때
-        startMonth = int(startTime[6])
-    else:
-        startMonth = int(startTime[5:7])
-        
-    if (int(startTime[8]) == 0): # 한자리의 일일 때
-        startDay = int(startTime[9])
-    else:
-        startDay = int(startTime[8:10])
-    
-    
-    
-    endYear = int(endTime[0:4])
-    
-    if (int(endTime[5]) == 0): # 한자리의 월일 때
-        endMonth = int(endTime[6])
-    else:
-        endMonth = int(endTime[5:7])
-        
-    if (int(endTime[8]) == 0): # 한자리의 일일 때
-        endDay = int(endTime[9])
-    else:
-        endDay = int(endTime[8:10])
-    
-    for item in c_List:
-        itemYear = int(item['time'][0:4])
-        
-        if (int(item['time'][5]) == 0): # 한자리의 월일 때
-            itemMonth = int(item['time'][6])
-        else:
-            itemMonth = int(item['time'][5:7])
-        
-        if (int(item['time'][8]) == 0): # 한자리의 일일 때
-            itemDay = int(item['time'][9])
-        else:
-            itemDay = int(item['time'][8:10])
-        
-        if (itemYear >= startYear and itemYear <= endYear):
-            if (itemMonth >= startMonth and itemMonth <= endMonth):
-                if(itemDay >= startDay and itemDay <= endDay):
-                    result_List.append(item)
-    
-    return result_List
-
-def printMostShowed(c_List,limit):
-    temp = ""
-    result = ""
-    for item in c_List:
-        temp = str(item['comment']) + " "
-        result = result + temp
-    
-    sp = Twitter()
-    
-    nouns = sp.nouns(result)
-    
-    _cnt = Counter(nouns)
-    
-    tempList = []
-    repCnt = 0
-    
-    for i,j in _cnt.most_common(limit):
-        print(str(repCnt+1)+'. '+str(i)+" : "+str(j))
-        repCnt += 1
-        
-def printResult(c_List):
-    for i in range(0,len(c_List)):
-        print(c_List[i])
-
-def main ():
-    ## 시작화면
-    
-    _star = '*'
-    print(_star.center(30,'*'))
-    print('\n')
-    headString = '< Naver News Crawling >'
-    print(headString.center(30,'*'))
-    print('\n')
-    print(_star.center(30,'*'))
-    
-    
-    # 검색하고자 하는 url을 입력받는다
-    _url = input('검색하고자 하는 url을 입력해주세요: ')
-    print('comment_list를 가져오는 중.....')
-    cList = getData(_url)
-    refine_time(cList)
-    #printMostShowed(cList,10)
-    print('\n')
-    print('comment_list를 다 가져왔습니다!')
-    
-    while(True):
-        print('***********************************')
-        print('1.닉네임 기반 검색')
-        print('2.키워드 기반 검색')
-        print('3.작성시간 기반 검색')
-        print('4.자주 나타난 단어 출력')
-        menu = input('메뉴를 입력해주세요: ')
-        
-        if(menu == str(1)):
-            print('***********************************')
-            inputID = input('검색할 닉네임 앞 4자리를 입력해주세요(전 단계로 가시려면 -1을 입력해주세요): ')
-            if(inputID == str(-1)):
-                continue
-            _result = search_by_author(cList,inputID)
-            printResult(_result)
-            print(_result)
-        elif(menu == str(2)):
-            print('***********************************')
-            inputKW = input('검색할 키워드를 입력해주세요(전 단계로 가시려면 -1을 입력해주세요): ')
-            if(inputKW == str(-1)):
-                continue
-            _result = search_by_keyword(cList,inputKW)
-            printResult(_result)
-        elif(menu == str(3)):
-            print('***********************************')
-            print('전 단계로 돌아가시려면 -1을 입력해주세요')
-            startTime = input('검색할 시간대의 시작일을 입력해주세요(YYYY-MM-DD): ')
-            endTime = input('검색할 시간대의 마지막 일을 입력해주세요(YYYY-MM-DD): ')
-            
-            if(startTime == str(-1) or endTime == str(-1)):
-                continue
-                
-            _result = search_by_time(cList,startTime,endTime)
-            printResult(_result)
-        elif(menu == str(4)):
-            print('***********************************')
-            inputLimit = input('상위 몇 개 까지 보고 싶은지 입력하세요(1~20): ')
-            while(True):
-                if (int(inputLimit) <= 0 or int(inputLimit) > 20):
-                    inputLimit = input('상위 몇 개 까지 보고 싶은지 입력하세요(1~20): ')
-                else:
-                    break
-                
-            printMostShowed(cList,int(inputLimit))
-        else:
-            print('잘못된 입력입니다')
-            continue
-            
-
-    
-main()
--- a/twitter/now.md deleted 100644 → 0
View file @21f26f7
+++ b/twitter/now.md deleted 100644 → 0
View file @21f26f7
-현재 검색 기능 구현 완료와 더불어
-검색 속도 향상 완료
\ No newline at end of file
--- a/twitter/readme.md deleted 100644 → 0
View file @21f26f7
+++ b/twitter/readme.md deleted 100644 → 0
View file @21f26f7
-적합하지 않은 webtoon 브랜치 삭제
-twitter branch 새로 구현
-
-추가로 twitter에서 댓글 받아오는 것을 아이디, 날짜, 내용, 링크로 구성된
-딕셔너리로 만듬
-
-이를 이용해서 더 구현 예정
\ No newline at end of file
--- a/twitter/twitter.py deleted 100644 → 0
View file @21f26f7
+++ b/twitter/twitter.py deleted 100644 → 0
View file @21f26f7
-import GetOldTweets3 as got
-from bs4 import BeautifulSoup
-
-import datetime
-import time
-from random import uniform
-from tqdm import tqdm_notebook
-
-def get_tweets(criteria):
-    tweet = got.manager.TweetManager.getTweets(criteria)
-    tweet_list = []
-
-    for index in tqdm_notebook(tweet):
-    
-        # 메타데이터 목록 
-        username = index.username
-        link = index.permalink 
-        content = index.text
-        tweet_date = index.date.strftime("%Y-%m-%d")
-        retweets = index.retweets
-        favorites = index.favorites
-     
-        # 결과 합치기
-        info_list = {'username' : username, 'text': content, 'time': tweet_date, 'link': link}
-        tweet_list.append(info_list) 
-        # 휴식 
-        time.sleep(uniform(1,2))
-    print("====================================")
-    if(len(tweet_list) == 0):
-        print("조건에 맞는 tweet이 없습니다.")
-    else:
-        print(tweet_list)
-    print("====================================")
-days_range = []
-
-start = datetime.datetime.strptime("2019-11-25", "%Y-%m-%d")
-end = datetime.datetime.strptime("2019-11-26", "%Y-%m-%d")
-date_generated = [start + datetime.timedelta(days=x) for x in range(0, (end-start).days)]
-
-for date in date_generated:
-    days_range.append(date.strftime("%Y-%m-%d"))
-print("=== 기본으로 설정된 트윗 수집 기간은 {} 에서 {} 까지 입니다 ===".format(days_range[0], days_range[-1]))
-print("=== 총 {}일 간의 데이터 수집 ===".format(len(days_range)))
-
-# 수집 기간 맞추기
-start_date = days_range[0]
-end_date = (datetime.datetime.strptime(days_range[-1], "%Y-%m-%d") 
-            + datetime.timedelta(days=1)).strftime("%Y-%m-%d") # setUntil이 끝을 포함하지 않으므로, day + 1
-
-my_key = input("검색할 키워드를 입력해주세요: ")
-
-while(True):
-    temp1 = "현재 검색어는 " + my_key + "입니다. "
-    print(temp1)
-    print("기간은 기본적으로 최근 1일입니다.")
-    print("빠른 검색을 지원하기 위해 최대 50건까지만 표시됩니다.")
-    print("1. 닉네임을 통한 검색")
-    print("2. 키워드를 통한 검색")
-    print("3. 시간을 통한 검색")
-    print("4. 종료")
-    userNum = int(input("무엇을 하시겠습니까?: "))
-    
-    if userNum == 1:
-        nick = input("검색할 닉네임을 입력해주세요: ")
-        print("1. 최근 10개만 보기")
-        print("2. 해당 닉네임의 트윗 50건 보기")
-        print("3. 현재 검색어를 적용시켜 보기")
-        tweetNum = int(input("무엇을 하시겠습니까?: "))
-        if(tweetNum == 1):
-            tweetCriteria = got.manager.TweetCriteria().setUsername(nick)\
-                                           .setSince(start_date)\
-                                           .setUntil(end_date)\
-                                           .setMaxTweets(10)
-            get_tweets(tweetCriteria)
-        elif(tweetNum == 2):
-            tweetCriteria = got.manager.TweetCriteria().setUsername(nick)\
-                                           .setSince(start_date)\
-                                           .setUntil(end_date)\
-                                           .setMaxTweets(50)
-            get_tweets(tweetCriteria)
-        elif(tweetNum == 3):
-            tweetCriteria = got.manager.TweetCriteria().setUsername(nick)\
-                                           .setQuerySearch(my_key)\
-                                           .setSince(start_date)\
-                                           .setUntil(end_date)\
-                                           .setMaxTweets(50)
-            get_tweets(tweetCriteria)
-        else:
-            print("잘못된 보기를 선택하셨습니다.")
-    elif userNum == 2:
-        my_key = input("검색할 키워드를 입력해주세요: ")
-        tweetCriteria = got.manager.TweetCriteria().setQuerySearch(my_key)\
-                                           .setSince(start_date)\
-                                           .setUntil(end_date)\
-                                           .setMaxTweets(50)
-        get_tweets(tweetCriteria)
-    elif userNum == 3:
-        user_start = int(input("시작일을 입력해주세요(yyyymmdd형태): "))
-        if(user_start < 20170000 or user_start > 20191200):
-            print("최근 3년 이내만 검색가능합니다.")
-            continue
-        user_end = int(input("종료일을 입력해주세요(yyyymmdd형태): "))
-        if(user_end > 20191200):
-            print("미래로 갈 수는 없습니다.")
-            continue
-        elif(user_end < user_start):
-            print("시작일보다 작을 수 없습니다.")
-            continue
-        if(user_end - 8 > user_start):
-            print("최대 1주일까지 검색이 가능합니다.")
-            continue
-        else:
-            start_year = user_start // 10000
-            start_month = user_start // 100 - start_year * 100
-            start_day = user_start - start_year * 10000 - start_month * 100
-            end_year = user_end // 10000
-            end_month = user_end // 100 - end_year * 100
-            end_day = user_end - end_year * 10000 - end_month * 100
-            d1 = str(start_year) + "-" + str(start_month) + "-" + str(start_day)
-            # d2는 보여주기용, d3는 실제 코드에 넣기용(코드에 넣을때는 +1을 해줘야 한다.)
-            d2 = str(end_year) + "-" + str(end_month) + "-" + str(end_day)
-            d3 = str(end_year) + "-" + str(end_month) + "-" + str(end_day + 1)
-            print("1. 현재 검색어를 적용시켜 검색")
-            print("2. 다른 검색어를 적용시켜 검색")
-            myNum = int(input("무엇을 선택하시겠습니까?: "))
-            if(myNum == 1):
-                print("1. 닉네임을 적용시켜 검색")
-                print("2. 닉네임 상관없이 전부 검색")
-                myNum1 = int(input("무엇을 선택하시겠습니까?: "))
-                if(myNum1 == 1):
-                    nick2 = input("검색할 닉네임을 입력해주세요: ")
-                    tweetCriteria = got.manager.TweetCriteria().setUsername(nick)\
-                                           .setQuerySearch(my_key)\
-                                           .setSince(d1)\
-                                           .setUntil(d3)\
-                                           .setMaxTweets(50)
-                elif(myNum1 == 2):
-                    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(my_key)\
-                                           .setSince(d1)\
-                                           .setUntil(d3)\
-                                           .setMaxTweets(50)
-                else:
-                    print("잘못된 입력입니다.")
-                    continue
-            elif(myNum == 2):
-                my_key = input("검색할 키워드를 입력해주세요: ")
-                print("1. 닉네임을 적용시켜 검색")
-                print("2. 닉네임 상관없이 전부 검색")
-                myNum2 = int(input("무엇을 선택하시겠습니까?: "))
-                if(myNum2 == 1):
-                    nick2 = input("검색할 닉네임을 입력해주세요: ")
-                    tweetCriteria = got.manager.TweetCriteria().setUsername(nick)\
-                                           .setQuerySearch(my_key)\
-                                           .setSince(d1)\
-                                           .setUntil(d3)\
-                                           .setMaxTweets(50)
-                elif(myNum2 == 2):
-                    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(my_key)\
-                                           .setSince(d1)\
-                                           .setUntil(d3)\
-                                           .setMaxTweets(50)
-                else:
-                    print("잘못된 입력입니다.")
-                    continue
-            else:
-                print("잘못된 입력입니다.")
-                continue
-            print("=== 현재 설정된 트윗 수집 기간은 {} 에서 {} 까지 입니다 ===".format(d1, d2))
-            print("=== 총 {}일 간의 데이터 수집 ===".format(user_end - user_start))
-            get_tweets(tweetCriteria)
-    elif userNum == 4:
-        break
-    else:
-        print("잘못된 입력입니다.")
-        continue
--- a/youtube.md deleted 100644 → 0
View file @21f26f7
+++ b/youtube.md deleted 100644 → 0
View file @21f26f7
-Youtube 3차 수정 사항
------------------------------------------------------
-1차에서 추가적으로 구현 할 사항
-
-1. 명령행 파라미터를 input 으로 넣는 함수
-2. csv 파일에서 리스트를 받아오는 함수
-3. 받아 온 Data를 가공 처리 하는 함수
- * 가장 많이 등장한 키워드 찾는 함수
- * 저자를 통해 검색하는 함수
- * 내가 쓴 댓글을 확인 하는 함수
- * 가장 댓글을 많이 입력한 사람을 찾는 함수
------------------------------------------------------
-2차 Update 사항
-
-1. 명령행 파라미터를 Input으로 변경하여 받도록 수정하였음
-2. csv 파일으로 저장 할 것인지 여부를 묻고, 저장 하지 않는 경우 Dictionary 형태로 List에 넣도록 수정하였음
-3. Test 형식으로 List에 들어간 값들이 정상적으로 출력되는지 점검하였음
------------------------------------------------------
-이후 추가 구현 사항
-
-1. Module 분리 (List 반환 모듈, Main 부분) -> 굳이 분리하지 않을 경우
-추가적으로 함수를 구현해야함
-2. 본격적으로 Data Set을 어떤 식으로 분리하여 제공 할지에 대한 추가적인 기능 구현 필요
-
------------------------------------------------------
-
-1. 2차 개발사항에서 오류가 있던 부분을 수정하였음
-2. 가져온 Comment를 가공하여 처리할 수 있도록 일부 함수 구현
- (1) 키워드를 통해 검색할 수 있도록 함수 구현
- (2) 작성자 이름을 통해 검색할 수 있도록 함수 구현
-
------------------------------------------------------
-추가 구현 사항
-
-1. konlpy (http://konlpy.org/ko/latest/)를 통하여 명사 추출 후 keyword 분석하기
-2. 시간대를 추출하여 시간대 별로 Comment 정리하기
------------------------------------------------------
-4차 개발사항
-
-1. konlpy를 이용하여 keyword 분석 후 가장 많이 등장한 키워드 리스트 출력
-2. 1번 기능을 사용하여 wordcloud 구성
-3. 시간대를 이용하여 검색할 수 있는 기능 구현
-4. 시간대 별로 sort된 리스트를 가질 수 있도록 구현
------------------------------------------------------
-추가 구현 사항
-
-1. 시간대 별로 sort된 리스트를 matplotlib python을 이용하여 차트화 시키기
-2. 기능 별로 접근할 수 있도록 정리할 것
------------------------------------------------------
-5차 개발사항
-
-1. 시간대 별로 sort된 리스트를 matplotlib for python을 이용하여 차트화 하였음
\ No newline at end of file