휴가중인데 이런 글이나 쓰고 있으니..ㅠㅠ <div>원문으로 보고 싶으신 분은 출처의 링크로 이동해 주세요. 아무것도 없는 빈 홈페이지에 가깝지만...<br><div><br></div> <div><header class="post-header" style="margin-bottom:30px;color:#111111;font-family:'맑은 고딕', sans-serif;font-size:16px;line-height:24px;background-color:#fdfdfd;"><h1 class="post-title" style="margin:0px 0px 15px;padding:0px;font-weight:300;font-size:42px;letter-spacing:-1px;line-height:1;">한글 자모 분리</h1> <p class="post-meta" style="margin:0px 0px 15px;padding:0px;font-size:14px;color:#828282;">Apr 26, 2015</p></header><article class="post-content" style="margin-bottom:30px;color:#111111;font-family:'맑은 고딕', sans-serif;font-size:16px;line-height:24px;background-color:#fdfdfd;"><h1 style="margin:0px 0px 15px;padding:0px;font-weight:300;">러스트 언어의 한글</h1> <p style="margin:0px 0px 15px;padding:0px;">러스트 언어에서 문자열(&str, String)은 UTF-8로 인코딩 된 <a target="_blank" href="http://www.unicode.org/glossary/#unicode_scalar_value" style="color:#1756a9;text-decoration:none;" target="_blank">Unicode Scalar Value</a>들이며, <code style="font-size:15px;border:1px solid rgb(232,232,232);padding:1px 5px;background-color:#eeeeff;">char</code> 역시 Unicode Scalar Value를 담고 있습니다.</p> <p style="margin:0px 0px 15px;padding:0px;">유니코드에서의 한글 처리에 대한 자세한 내용은 네이버 개발자 블로그(hello world)의 다음 링크를 참고해 주시기 바랍니다: <a target="_blank" href="http://helloworld.naver.com/helloworld/76650" style="color:#1756a9;text-decoration:none;" target="_blank">한글 인코딩의 이해 2편: 유니코드와 Java를 이용한 한글 처리</a></p> <h1 style="margin:0px 0px 15px;padding:0px;font-weight:300;">한글 자모 분리</h1> <ul style="margin:0px 0px 15px 30px;padding:0px;"><li><em>이 단락은 위 링크의 글 중에서 발췌하였습니다.</em></li></ul><p style="margin:0px 0px 15px;padding:0px;">일반적으로 사용하는 초성, 중성, 종성이 조합되어 있는 한글 글자(음절; Syllables)들은 <code style="font-size:15px;border:1px solid rgb(232,232,232);padding:1px 5px;background-color:#eeeeff;">가(U+AC00)</code>부터<code style="font-size:15px;border:1px solid rgb(232,232,232);padding:1px 5px;background-color:#eeeeff;">힣(U+D7A3)</code>의 영역에 존재합니다.</p> <p style="margin:0px 0px 15px;padding:0px;">한글 음절의 코드 포인트 값은 시작 값인 U+AC00에 ((초성 값 x 21) + 중성 값) x 28 + 종성 값을 더한 것입니다.</p> <div class="highlight" style="margin-bottom:15px;background:#ffffff;"><pre style="margin-top:0px;margin-bottom:15px;padding:8px 12px;font-size:15px;border:1px solid rgb(232,232,232);background-color:#eeeeff;"><code class="language-text" style="border:0px;padding:1px 0px;">| 값 | 초성 | 중성 | 종성 | | 값 | 초성 | 중성 | 종성 | |----|------|------|-------|--|----|------|------|------:| | 0 | ㄱ | ㅏ | 채움 | | 14 | ㅊ | ㅜ | ㄿ | | 1 | ㄲ | ㅒ | ㄱ | | 15 | ㅋ | ㅝ | ㅀ | | 2 | ㄴ | ㅑ | ㄲ | | 16 | ㅌ | ㅞ | ㅁ | | 3 | ㄷ | ㅒ | ㄳ | | 17 | ㅍ | ㅟ | ㅂ | | 4 | ㄸ | ㅓ | ㄴ | | 18 | ㅎ | ㅡ | ㅄ | | 5 | ㄹ | ㅔ | ㄵ | | 19 | | ㅢ | ㅅ | | 6 | ㅁ | ㅕ | ㄶ | | 20 | | ㅣ | ㅆ | | 7 | ㅂ | ㅖ | ㄷ | | 21 | | | ㅇ | | 8 | ㅃ | ㅗ | ㄹ | | 22 | | | ㅈ | | 9 | ㅅ | ㅠ | ㄺ | | 23 | | | ㅊ | | 10 | ㅆ | ㅘ | ㄻ | | 24 | | | ㅋ | | 11 | ㅇ | ㅛ | ㄼ | | 25 | | | ㅌ | | 12 | ㅈ | ㅙ | ㄽ | | 26 | | | ㅍ | | 13 | ㅉ | ㅚ | ㄾ | | 27 | | | ㅎ | </code></pre></div> <p style="margin:0px 0px 15px;padding:0px;">예를 들어, '한'이라는 글자는 'ㅎ', 'ㅏ', 'ㄴ'으로 구성되어 있으며, 각각 18, 0, 4 값을 가지고 있으므로, '한'의 코드 포인트 값은 <code style="font-size:15px;border:1px solid rgb(232,232,232);padding:1px 5px;background-color:#eeeeff;">U+AC00 + ((18 x 21) + 0) x 28 + 4 = U+AC00 + U+295C = U+D55C</code>가 됩니다.</p> <p style="margin:0px 0px 15px;padding:0px;">이를 역으로 생각해 보면, 한글 음절에 대해 초성, 중성, 종성의 분리가 가능하다. 즉 한글 음절의 코드 포인트 값에서 U+AC00을 뺀 값을 <code style="font-size:15px;border:1px solid rgb(232,232,232);padding:1px 5px;background-color:#eeeeff;">x</code>이라 한다면, 다음과 같이 정리할 수 있습니다.</p> <ul style="margin:0px 0px 15px 30px;padding:0px;"><li>초성: <code style="font-size:15px;border:1px solid rgb(232,232,232);padding:1px 5px;background-color:#eeeeff;">x</code>의 값을 (21 x 28)로 나눈 몫</li> <li>중성: <code style="font-size:15px;border:1px solid rgb(232,232,232);padding:1px 5px;background-color:#eeeeff;">x</code>의 값을 (21 x 28)로 나눈 나머지를 28로 나눈 몫</li> <li>종성: <code style="font-size:15px;border:1px solid rgb(232,232,232);padding:1px 5px;background-color:#eeeeff;">x</code>의 값을 28로 나눈 나머지</li></ul><h1 style="margin:0px 0px 15px;padding:0px;font-weight:300;">러스트 언어를 사용한 구현 코드</h1> <div class="highlight" style="margin-bottom:15px;background:#ffffff;"><pre style="margin-top:0px;margin-bottom:15px;padding:8px 12px;font-size:15px;border:1px solid rgb(232,232,232);background-color:#eeeeff;"><code class="language-rust" style="border:0px;padding:1px 0px;"><span class="c-Doc">/// 초성</span><span class="k" style="font-weight:bold;">static</span> <span class="n">CHO</span> <span class="o" style="font-weight:bold;">:</span> <span class="p">[</span><span class="n">char</span><span class="p">;</span><span class="mi" style="color:#009999;">19</span><span class="p">]</span> <span class="o" style="font-weight:bold;">=</span> <span class="p">[</span> <span class="sc" style="color:#dd1144;">'ㄱ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄲ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄴ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄷ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄸ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄹ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅁ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅂ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅃ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅅ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅆ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅇ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅈ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅉ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅊ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅋ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅌ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅍ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅎ'</span> <span class="p">];</span> <span class="c-Doc">/// 중성</span><span class="k" style="font-weight:bold;">static</span> <span class="n">JUNG</span><span class="o" style="font-weight:bold;">:</span> <span class="p">[</span><span class="n">char</span><span class="p">;</span><span class="mi" style="color:#009999;">21</span><span class="p">]</span> <span class="o" style="font-weight:bold;">=</span> <span class="p">[</span> <span class="sc" style="color:#dd1144;">'ㅏ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅐ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅑ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅒ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅓ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅔ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅕ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅖ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅗ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅘ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅙ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅚ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅛ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅜ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅝ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅞ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅟ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅠ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅡ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅢ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅣ'</span> <span class="p">];</span> <span class="c-Doc">/// 종성</span><span class="k" style="font-weight:bold;">static</span> <span class="n">JONG</span><span class="o" style="font-weight:bold;">:</span> <span class="p">[</span><span class="n">char</span><span class="p">;</span><span class="mi" style="color:#009999;">28</span><span class="p">]</span> <span class="o" style="font-weight:bold;">=</span> <span class="p">[</span> <span class="sc" style="color:#dd1144;">' '</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄱ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄲ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄳ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄴ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄵ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄶ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄷ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄹ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄺ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄻ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄼ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄽ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄾ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㄿ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅀ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅁ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅂ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅄ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅅ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅆ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅇ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅈ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅊ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅋ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅌ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅍ'</span><span class="p">,</span> <span class="sc" style="color:#dd1144;">'ㅎ'</span> <span class="p">];</span> <span class="c-Doc">/// 한글 소리마디(Hangul Syllables)를 한글 자모(Hangul Jamo)로 분리.</span><span class="c-Doc">/// 다음 두 페이지를 참조하였습니다:</span><span class="c-Doc">/// * <a target="_blank" href="http://westzero.tistory.com/112">http://westzero.tistory.com/112</a></span><span class="c-Doc">/// * <a target="_blank" href="http://helloworld.naver.com/helloworld/76650">http://helloworld.naver.com/helloworld/76650</a></span><span class="k" style="font-weight:bold;">fn</span> <span class="n">syllables_to_jamo</span><span class="p">(</span> <span class="n">string</span><span class="o" style="font-weight:bold;">:</span> <span class="o" style="font-weight:bold;">&</span><span class="kt" style="color:#445588;font-weight:bold;">str</span> <span class="p">)</span> <span class="o" style="font-weight:bold;">-></span> <span class="n">Vec</span><span class="o" style="font-weight:bold;"><</span><span class="n">char</span><span class="o" style="font-weight:bold;">></span><span class="p">{</span> <span class="kd" style="font-weight:bold;">let</span> <span class="k" style="font-weight:bold;">mut</span> <span class="n">v</span><span class="o" style="font-weight:bold;">:</span> <span class="n">Vec</span><span class="o" style="font-weight:bold;"><</span><span class="n">char</span><span class="o" style="font-weight:bold;">></span> <span class="o" style="font-weight:bold;">=</span> <span class="n">Vec</span><span class="o" style="font-weight:bold;">::</span><span class="n">new</span><span class="p">();</span> <span class="k" style="font-weight:bold;">for</span> <span class="n">c</span> <span class="k" style="font-weight:bold;">in</span> <span class="n">string</span><span class="p">.</span><span class="n">chars</span><span class="p">()</span> <span class="p">{</span> <span class="kd" style="font-weight:bold;">let</span> <span class="n">code</span> <span class="o" style="font-weight:bold;">=</span> <span class="n">c</span> <span class="k" style="font-weight:bold;">as</span> <span class="n">usize</span><span class="p">;</span> <span class="k" style="font-weight:bold;">if</span> <span class="n">code</span> <span class="o" style="font-weight:bold;"><</span> <span class="mh" style="color:#009999;">0xAC00</span> <span class="o" style="font-weight:bold;">||</span> <span class="n">code</span> <span class="o" style="font-weight:bold;">></span> <span class="mh" style="color:#009999;">0xD7A3</span> <span class="p">{</span> <span class="k" style="font-weight:bold;">continue</span><span class="p">;</span> <span class="p">}</span> <span class="kd" style="font-weight:bold;">let</span> <span class="n">a</span> <span class="o" style="font-weight:bold;">=</span> <span class="n">code</span> <span class="o" style="font-weight:bold;">-</span> <span class="mh" style="color:#009999;">0xAC00</span><span class="p">;</span> <span class="kd" style="font-weight:bold;">let</span> <span class="n">cho</span> <span class="o" style="font-weight:bold;">=</span> <span class="n">a</span> <span class="o" style="font-weight:bold;">/</span> <span class="p">(</span><span class="mi" style="color:#009999;">21</span><span class="o" style="font-weight:bold;">*</span><span class="mi" style="color:#009999;">28</span><span class="p">);</span> <span class="kd" style="font-weight:bold;">let</span> <span class="n">jung</span> <span class="o" style="font-weight:bold;">=</span> <span class="p">(</span><span class="n">a</span> <span class="o" style="font-weight:bold;">%</span> <span class="p">(</span><span class="mi" style="color:#009999;">21</span><span class="o" style="font-weight:bold;">*</span><span class="mi" style="color:#009999;">28</span><span class="p">))</span> <span class="o" style="font-weight:bold;">/</span> <span class="mi" style="color:#009999;">28</span><span class="p">;</span> <span class="kd" style="font-weight:bold;">let</span> <span class="n">jong</span> <span class="o" style="font-weight:bold;">=</span> <span class="n">a</span> <span class="o" style="font-weight:bold;">%</span> <span class="p">(</span><span class="mi" style="color:#009999;">28</span><span class="p">);</span> <span class="n">v</span><span class="p">.</span><span class="n">push</span><span class="p">(</span> <span class="n">CHO</span><span class="p">[</span><span class="n">cho</span><span class="p">]</span> <span class="p">);</span> <span class="n">v</span><span class="p">.</span><span class="n">push</span><span class="p">(</span> <span class="n">JUNG</span><span class="p">[</span><span class="n">jung</span><span class="p">]</span> <span class="p">);</span> <span class="k" style="font-weight:bold;">if</span> <span class="n">jong</span> <span class="o" style="font-weight:bold;">></span> <span class="mi" style="color:#009999;">0</span> <span class="p">{</span> <span class="n">v</span><span class="p">.</span><span class="n">push</span><span class="p">(</span> <span class="n">JONG</span><span class="p">[</span><span class="n">jong</span><span class="p">]</span> <span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="n">v</span><span class="p">}</span> <span class="k" style="font-weight:bold;">fn</span> <span class="n">main</span><span class="p">()</span><span class="p">{</span> <span class="kd" style="font-weight:bold;">let</span> <span class="n">jamo</span> <span class="o" style="font-weight:bold;">=</span> <span class="n">syllables_to_jamo</span><span class="p">(</span> <span class="s" style="color:#dd1144;">"한글"</span> <span class="p">);</span> <span class="k" style="font-weight:bold;">for</span> <span class="n">c</span> <span class="k" style="font-weight:bold;">in</span> <span class="n">jamo</span> <span class="p">{</span> <span class="nb" style="color:#0086b3;">print</span><span class="o" style="font-weight:bold;">!</span><span class="p">(</span> <span class="s" style="color:#dd1144;">"{} "</span><span class="p">,</span> <span class="n">c</span> <span class="p">);</span> <span class="p">}</span><span class="p">}</span></code></pre></div> <h1 style="margin:0px 0px 15px;padding:0px;font-weight:300;">덧붙임</h1> <p style="margin:0px 0px 15px;padding:0px;">한글 자모를 분리하여 이런 짓을 할 수 있습니다. <a target="_blank" href="http://www.mediatoday.co.kr/news/articleView.html?idxno=122835" style="color:#1756a9;text-decoration:none;" target="_blank">모 채널에서 뉴스 방송에 내보내어</a> 유명해진(...) 바로 그 이름궁합입니다.</p> <p style="margin:0px 0px 15px;padding:0px;"><img src="http://regentag.github.io/static/rust-hangul-jamo/namesync.png" alt="" style="vertical-align:middle;"></p> <p style="margin:0px 0px 15px;padding:0px;">이를 통해 러스트와 윈도우의 싱크로율은 41%밖에 되지 않는다는 사실을 알 수 있습니다. <del>사실 반대로 돌리면 92%</del></p> <p style="margin:0px 0px 15px;padding:0px;">이름궁합 코드는 <a target="_blank" href="https://gist.github.com/Regentag/c82b2534b08fb4b699bf" style="color:#1756a9;text-decoration:none;" target="_blank">Gist</a>에 있습니다.</p></article></div></div>
댓글 분란 또는 분쟁 때문에 전체 댓글이 블라인드 처리되었습니다.