๐ŸŒ AIๆœ็ดข & ไปฃ็† ไธป้กต
Skip to content

2021 Ajou University Spring SW capstone design - FindU NLP (Winning the gold prize 2021 College Student Paper Contest of DCS)

License

Notifications You must be signed in to change notification settings

SWCapstone2021/NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

FindU NLP

logo

์Šคํฌ๋ฆฝํŠธ ๊ธฐ๋ฐ˜ ์˜์ƒ ๊ฒ€์ƒ‰ ๋ฐ ์š”์•ฝ ์„œ๋น„์Šค


Website โ€ข Dependency โ€ข Features โ€ข Contributors โ€ข License โ€ข Reference

๋ณธ ํ”„๋กœ์ ํŠธ๋Š” 2021 Ajou University Spring SW Capston Design ๊ณผ๋ชฉ์˜ ์ผํ™˜์œผ๋กœ ์ง„ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
ํ•ด๋‹น repository๋Š” ์ฐพ์•„๋ด์œ ์˜ NLP ์†Œ์Šค์ฝ”๋“œ๋ฅผ ์ €์žฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
์ƒ์—…์  ๋ชฉ์ ์„ ๋„๊ณ  ์žˆ์ง€ ์•Š์œผ๋ฉฐ, ํŒ€ APC์— ์˜ํ•ด ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ†2021๋…„ ํ•œ๊ตญ๋””์ง€ํ„ธ์ฝ˜๏ฟฝ๏ฟฝ๏ฟฝ์ธ ํ•™ํšŒ ๋Œ€ํ•™์ƒ ๋…ผ๋ฌธ๊ฒฝ์ง„๋Œ€ํšŒ ๊ธˆ์ƒ ์ˆ˜์ƒ๐Ÿ†

Website

Visit out website FindU ๐Ÿ˜€

Dependency

FindU-NLP is based on torch=1.8.1(cuda 11.1) and python 3.8

์ž์„ธํ•œ dependency๋Š” requirements๋ฅผ ์ฐธ๊ณ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

Features

STT

์ฐพ์•„๋ด์œ ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์˜์ƒ์˜ script๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ ์œ ํŠœ๋ธŒ์—์„œ๋Š” script๊ฐ€ ์—†๋Š” ์˜์ƒ์ด ๋งŽ๊ณ  '์ž๋ง‰ ์ž๋™ ์ƒ์„ฑ ๊ธฐ๋Šฅ'์ด ์žˆ์ง€๋งŒ ํ•œ๊ตญ์–ด์˜ ๊ฒฝ์šฐ ์ œ๋Œ€๋กœ ์ž๋ง‰ ์ƒ์„ฑ์ด ์ด๋ฃจ์–ด์ง€์ง€ ์•Š์•„ ํ•œ๊ตญ์–ด์— ๋งž๋Š” STT model์„ ์ œ์ž‘ํ•˜์—ฌ ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•œ๋‹ค.

Dataset AIHub
Model DeepSpeech2
Period Iteration 1~3
Model path 'STT/models/ds2.pt'
from STT import load_stt_model, stt

stt_model, stt_vocab = load_stt_model()  # model๊ณผ vocab์€ ์„œ๋ฒ„๊ฐ€ ์‹œ์ž‘ํ•  ๋•Œ load
audio_path = 'your/audio_path/origin_audio.wav'

sentences = stt(stt_model, stt_vocab, audio_path)  # sentences๋Š” list๋กœ (์‹œ๊ฐ„, ์ž๋ง‰)์œผ๋กœ ๊ตฌ์„ฑ
>> sentences[0] = (3.2, "๋ฒˆ์—ญ๋œ ์ž๋ง‰์ด ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.")

Ctrl+F๊ธฐ๋Šฅ

ํ•ด๋‹น ํ‚ค์›Œ๋“œ๊ฐ€ ๋™์˜์ƒ์˜ ์–ด๋–ค ๊ตฌ๊ฐ„์— ์žˆ๋Š”์ง€ ์ฐพ์•„์ค€๋‹ค. ํ‚ค์›Œ๋“œ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ํ‚ค์›Œ๋“œ๊ฐ€ ์†ํ•ด์žˆ๋Š” ๋Œ€์‚ฌ๊ฐ€ ์‹œ์ž‘ํ•˜๋Š” ์‹œ๊ฐ„์„ ๋ฆฌ์ŠคํŠธํ˜•์‹์œผ๋กœ returnํ•œ๋‹ค.

from basefunction import ctrl_f

SearchingValue = input("keyword:")
timestamp = ctrl_f(SearchingValue, json_file) 
>>> ['00','00', ...]  #  SearchingValue์˜ ์˜์ƒ ์‹œ์ž‘์‹œ๊ฐ„ return

์‹ ๋ขฐ๋„ ๊ธฐ๋Šฅ

์˜์ƒ์˜ ์ œ๋ชฉ๊ณผ ๋‚ด์šฉ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์–ผ๋งˆ๋‚˜ ์—ฐ๊ด€์„ฑ์ด ๋†’์€์ง€ ์ˆ˜์น˜๋กœ ๋ณด์—ฌ์ค€๋‹ค. ์ œ๋ชฉ sentence vector์™€ ๋‚ด์šฉ sentence vector๋ฅผ cos-similarity๋กœ ๊ณ„์‚ฐํ•˜์—ฌ ์˜์ƒ์˜ ์‹ ๋ขฐ๋„๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค. ์‹ ๋ขฐ๋„์˜ ๋ฒ”์œ„๋Š” 0~10์ด๋‹ค.

from wordembedding import cosin_similar

model = load_wm_model()  # word embedding model์€ ์„œ๋ฒ„๊ฐ€ ์‹œ์ž‘ํ•  ๋•Œ  load
SearchingValue = input("keyword:")

score = cosin_similar(SearchingValue, json_file, model)
>>> 0.3

word embedding + crtl_F ๊ธฐ๋Šฅ(association_f)

ํ•ด๋‹น ํ‚ค์›Œ๋“œ์™€ ํ‚ค์›Œ๋“œ์˜ ์—ฐ์ƒ๋‹จ์–ด๊ฐ€ ๋™์˜์ƒ์˜ ์–ด๋–ค ๊ตฌ๊ฐ„์— ์žˆ๋Š”์ง€ ์ฐพ์•„์ค€๋‹ค.

from wordembedding import association_f

model = load_wm_model()  # word embedding model์€ ์„œ๋ฒ„๊ฐ€ ์‹œ์ž‘ํ•  ๋•Œ  load
SearchingValue = input("keyword:")

association_f(SearchingValue, json_file, model)
>>> ['00','00', ...]   #  SearchingValue์˜ ์˜์ƒ ํƒ€์ž„์Šคํƒฌํ”„์™€ SearchingValue์˜ ์—ฐ์ƒ๋‹จ์–ด๊ฐ€ ํ•ด๋‹นํ•˜๋Š” ์˜์ƒ ํƒ€์ž„์Šคํƒฌํ”„ return

QA System

์‚ฌ์šฉ์ž๊ฐ€ ๋” ์ธ๊ฐ„์ ์ธ ์งˆ๋ฌธ์„ ๋˜์ง€๊ณ  ์ด์— ํ•ด๋‹นํ•˜๋Š” ๋‹ต๋ณ€์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

Dataset KoQuAD1.0, KoQuAD2.0
Model bert-multilingual
Period Iteration 20
Model path 'QA/models/*'
from QA import load_qa_model, QA_system

qa_model, qa_tokenizer = load_qa_model()  # model๊ณผ tokenizer๋Š” ์„œ๋ฒ„๊ฐ€ ์‹œ์ž‘ํ•  ๋•Œ load

question = 'Your Question'
answers = QA_system(qa_model, qa_tokenizer, question, json_script)  # answers๋Š” list๋กœ (index, ๋‹ต๋ณ€)์œผ๋กœ ๊ตฌ์„ฑ, index๋Š” ํ•ด๋‹น ๋‹ต๋ณ€์ด ์ถœํ˜„ํ•˜๋Š” script์˜ index
>> (index, "๋‹ต๋ณ€")

Summarization

์ „์ฒด ์Šคํฌ๋ฆฝํŠธ์˜ 3์ค„์ •๋„ ๋ถ„๋Ÿ‰์„ ์š”์•ฝํ•ด์„œ ๋ณด์—ฌ์ค€๋‹ค.

from Summarization import load_sc_model, summary_script
from pororo import Pororo

summ_model = load_sc_model()
summarized_texts = summary_script(json_file, summ_model)
>>> "Text.Text.Text."

Contributors

Maintainer : ๋‚จํฌ์ˆ˜, ์˜ค์Šน๋ฏผ

Contributor : ๊ฐ•ํ•œ๊ฒฐ, ๊น€์ˆ˜์—ฐ, ํ—ˆ๋ฒ”์ˆ˜

License

FindU-NLP project is licensed under the terms of the Apache License 2.0.

Reference

PORORO

๐Ÿค—transformers

About

2021 Ajou University Spring SW capstone design - FindU NLP (Winning the gold prize 2021 College Student Paper Contest of DCS)

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •