土人之NLP日志: 2011

Monday, 21 November 2011

从ppt 转成 eps文件过程

作为一个免不了写paper的人，自然需要制作一些eps文件嵌入到latex文件。下面怎么使用ppt来做图最后生成eps。

1. 使用ppt 做一页slide。
2. 打印输出到pdf(current slide)
3. pdf再次打印成ps（print to file）。
3. 使用 ps -> eps选项

Wednesday, 22 June 2011

1. PPT ---(CutePDF) -> PDF

2. PDF ---(ToPS) -> PS (choose auto-rotate and center)

3. PS -> EPS

How to set a new printer named ToPS (http://u.cs.biu.ac.il/~herzbea/makeP.htm)

To install the MS Publisher Imagesetter virtual printer, open the `Printers` settings folder, and click `Add Printer`. Select a local printer; since it is virtual, select the FILE: port. Specify `Generic` as manufacturer, and then you can choose the MS Publisher Imagesetter.

After you install the printer, you may want to fine-tune its properties. To do so, open again the Printers setting folder, and right click on the printer; select `Properties`. Under `Device Settings`, set very low values (e.g. 5) to the following two parameters:

Minimum font size to download as outline

Maximum font size to download as bitmap

Next, go to the `Advanced` tab, and from there, select `Printing Defaults…`. In the window that opens, select `Document options`, then `Postscript options`. Set the following two options:

PostScript Output Option: Optimize for Portability

TrueType Font Download Option: Outline

Monday, 23 May 2011

Moses-训练Hierachical model

注意点：

--2011.5.24
1. input-type set as 0 for hiero/string-to-tree; 3 for tree-to-string
2. 应该使用：/moses-chart-cmd/src/moses_chart

--2011.5.23
在train-model.perl时使用 -hierachical -glue-grammar -max-phrase-length 5。记得把训练phrase-model的-reordering去掉。

Sunday, 23 January 2011

训练基于Moses的中英翻译

训练基于Moses的中英翻译

开始学习SMT，第一步就是想熟悉一下SMT的整个运行过程。于是乎就开始练习使用Moses。使用过程遇到一些问题，记下来免得以后忘记了。

基本上按照这里的指南一步一步的测试。该指南写的相当得好。很好！
http://www.statmt.org/moses_steps.html

但是还是会遇到几个问题。
测试环境：2.6.31-22-server #68-Ubuntu SMP Tue Oct 26 16:50:02 UTC 2010 x86_64 GNU/Linux

----------------
支持软件
----------------
++支持软件的版本尽量和指南一致。如果没有一致的，可以选择稍微早一点的版本。如果选择后面的版本，会有一些变数。我还没有去探索具体哪些版本好用。
比如：SRILM 1.5.7，在主页上没有这个版本。我开始选择最新版本，结果遇到一些问题。后来改成下载1.4.6就好用了

++moses-scripts编译中间会提示一些Boost lib缺少的问题，实际不影响(也许在某些地方有用，但是作为我这么菜鸟级的，无所谓)

++如果moses是直接下载的，有时候会出现dos格式 ^M问题，导致perl脚本执行不了的问题。
解决方案1. 把所有的perl文件第一行结尾加入空格。
解决方案2. 在所有调用位置加入 perl a.pl
解决方案3. svn co https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk moses去下载(这也是指南推荐的，我开始土了，直接用web去下载)

---------------
训练过程
---------------
++语料必须是UTF8的

++Giza++在训练过程中，遇到一些core溢出的问题，经过google找回来的问题，我做了如下更改，好用！
*** file_spec.h 2009/07/10 21:38:39 1.1
--- file_spec.h 2009/07/13 11:37:21
! char time_stmp[17];

! sprintf(time_stmp, "%02d-%02d-%02d.%02d%02d%02d.", local->tm_year,
(local->tm_mon + 1), local->tm_mday, local->tm_hour,
local->tm_min, local->tm_sec);
--- 37,49 ----
! char time_stmp[19];

! sprintf(time_stmp, "%04d-%02d-%02d.%02d%02d%02d.", 1900 + local->tm_year,
(local->tm_mon + 1), local->tm_mday, local->tm_hour,
local->tm_min, local->tm_sec);

++MERT训练的时候，应该加入--mertdir 来指定路径，是不是旧版本没有这个问题？我不是很清楚。

++本文用了64位机器，脚本路径应该是i686-m64

大致就这样，其他要做的事情就等。。。直到模型训练成功。

-------------
初步试验结果
--------------
++设置
FBIS作为双语语料
GIGA_xin来训练语料模型(只使用头1M句)
GIGA_xin来训练Recaser(所有的)
没有MERT

++BLUE
19.79

哇，这么高！休息先

2011.1.24