情報開発と利活用

令和維新

2004年4月から企業を離れ、コンサルタントや情報起業を試行しながら、失われた２０年を取り戻し、日本再生をめざす私の歩みをみて一人でも多くの後継者が出てくれることを祈りつつ友達から紹介されたこのホームページの更新を続け、これまでの活動の記録と今やビッグデータ、AI,、IOT,ブロックチェーン、仮想通貨の時代になってしまいましたが、私の現状認識と関連技術を紹介していきたい。

フォローする

Recent Posts

先端技術情報20251115
「ＰＢ黒字化目標を終わらせよう」三つの事実
XRPのリップルプライム買収は、仮想通貨業界のすべてを変え得る(XRP’s Ripple Prime Acquisition Could Change Everything in Crypto)
アルトコインの底が形成されつつある：今が最高の参入タイミングかもしれない理由（Altcoin Bottom Forming: Why Now Might Be Your Best Entry Point）
(２)再利用とプロファイリングを用いたAIプロンプト開発(AI Prompt Development with Reuse and Profiling)
先端技術情報20251114
「思い込み」で亡ぶ
DITAコンテンツ変換の裏技（DITA Content Conversion Hacks）
XRP市場分析：リップルCEOが仮想通貨の未来について実際に語ったこと（XRP Market Analysis: What Ripple CEO Really Said About Crypto’s Future）
(1)再利用とプロファイリングを用いたAIプロンプト開発(AI Prompt Development with Reuse and Profiling)

Comments

やすじ2004 @ Re:2024年に仮想通貨ネオバンキングに切り替える8つの説得力のある理由とは?（op 8 Compelling Reasons to Make the Switch to Crypto Neo Banking in 2024?）(01/17) 寒い日が続くので、週末はしっかり食べて…

aki@ Re:ビットコインアナリストのプランBは10ヶ月間の「顔が溶けるFOMO」を予測Bitcoin analyst PlanB predicts 10 months of ‘face melting FOMO’(03/12) この様な書込大変失礼致します。日本も当…

Adobe@ Re:（2）CIDM Sponsor Profile – Adobe Technical Communications CIDM スポンサー紹介－アドービテクニカル・コミュニケーションズ(01/04) XML Documentation for Adobe <small> <a…

令和維新 @ Re:情報開発と利活用202001004(10/04) 学術会議は、日本の防衛のための研究は反…

令和維新 @ Re[1]:公務員を増やせ！(08/21) GKenさんへ省益を優先して、管轄大臣や…

Favorite Blog

🍐 新作ブログ小説「… New! 神風スズキさん

源氏物語〔34帖若菜…

New! USM1さん

キハ20、DF50、キハ…

GKenさん

直説法、仮定法の「…

samito07さん

P's Pictures P's Picturesさん

UX

(0)

オンライン

(0)

ソーシャルメデイア

(0)

開発

(0)

スマートマネー

(0)

流動性

(0)

教育訓練開発手法の紹介

Headline News

Shopping List

お買いものレビューがまだ書かれていません。

< 新しい記事

新着記事一覧(全7180件)

過去の記事 >

2021.05.28

(3)A Homebrew Reuse Analyzer 自家製再利用分析器

テーマ：自分らしい生き方・お仕事(41084)

カテゴリ：情報開発

Calculating the score is equally simple: if l1 and l2 are the lengths of the two strings, and d is their Levenshtein distance, the score is: ( l1 + l2 – d )/( l1 + l2 ).
スコアの計算はシンプルです：もし l1 と l2 は２つの文字列の長さで、ｄが彼らの Levenshtein 距離であるならそのスコアは次の通りです：（ l1 ＋ l2 －ｄ） / （ l1 ＋ l2 ）。

There are other fuzzy matching techniques, but I used this one as a starting point.
他のファジーマッチングテクニックがありますが、私はこれを出発点として使いました。

Preparing the content for analysis
解析のためにコンテンツを準備します

Ideally, the content needs to be stripped of all markup. The text of one block element should all be on one line. My original thought was to write (or find) a DITA-OT plugin that would publish a bookmap to CSV, where each record would contain the file name and one block (or paragraph, if you prefer) of text.
理想的には、そのコンテンツはすべてのマーク付けを取り去る必要があります。１つのブロック要素の本文はすべて１行上にあるべきです。私の最初の考えは、 CSV （あるいは段落、もしあなたがそちらをより望むなら）へのブックマップを発行するであろう DITA- ＯＴのプラグイン書く（あるいは見い出す）ことでした。そこでは、それぞれの記録がファイル名と１つのテキストブロック（あるいは段落、好むなら）を含むであろう。

This took more effort than the analysis script, believe it or not. After a brief experiment with a “plain text” plugin, I decided to try exporting to Markdown, a transform built-in to DITA-OT 3.1 and newer. From there, a utility called pandoc stripped the remaining markup and eliminated line-wrapping. The commands can be placed in a shell script:
これは解析スクリプトより多くの努力を要しました、信じようが信じまいが。「平文」プラグインでの短い実験の後に、私はコードを外して取り出すことや DITA - ＯＴの３ . １そしてもっと新しいものへの組み込み変換の試みをすることを決めました。そこから、 pandoc と呼ばれるユーテリテイが残っているマーク付けをはずして、そして行のラッピングを排除しました。そのコマンドはシェルスクリプトに置くことができます：