Vlad Skvortsov's Blog

cyrillic ditrack gtd guitar mac music pdf phd postscript productivity programming projects python r russian shell terminal tex tips utf-8 wadcomblog

My name is Vlad Skvortsov, I'm a software engineer: it's my job, hobby and addiction.

My primary interests include high-performance, scalable, fault-tolerant distributed systems; server-side applications; information retrieval technologies; procedural aspects of software engineering process.

I work on several private and open-source projects, hacking in Python, Haskell, Erlang, Perl and C.

Guitar, hiking, ice hockey and other hobbies help me to balance my life.

My e-mail is vss@73rus.com.

Cyrillic Fonts In R Plots

May 20 2007, 13:46 permalink

I've been using the R Project to perform statistic-related tasks in my ongoing Ph.D work. It's a great system and the only thing I had missed so far was the ability to annotate plots with russian texts in cyrillic. Of course R is capable of that, but I haven't seen complete recipes on how to do it anywhere on the web.

It turned out to be a nontrivial task. Apparently, there is a few Russian people using R on MacOS X. I've asked for help on the R mailing list, but the instructions didn't quite work, since my configuration was different (I use KOI8-R locale). I was also referred to Vol 6/2 (May 2006) of R News, which was helpful but, again, didn't offer a working solution to my problem.

So here is how I made it work. First of all, R doesn't include fonts with cyrillic glyphs in the distributive (no surprise here), so you need to download ones. I've followed the reference in the abovementioned R news article and got a pack of "PSCyr" fonts from here (you only need PSCyr-0.4c-patch2-type1.tar.gz).

At the time of plotting (i.e. when you use "plot", "grid" or other drawing primitives) R actually needs to know only font metrics (i.e. how much space each particular glyph spans). That's because the output PDF (or PS, I just stick to PDFs since they are faster to render in the Preview application) doesn't contain the font data itself (i.e. how to draw glyphs). To make R aware of the fonts you are going to use, you create the "font object" with the "Type1Font" function (since the fonts in the PSCyr are Type1 fonts). I chose to use the Times face, so in my case the commands looked like:

> afms = c("times.afm", "timesbd.afm", "timesi.afm", "timesbi.afm")
> RU = Type1Font("TimesNewRomanPSMT-Regular", afms, "KOI8-R")
Two things here to pay attention to. The first is that the font metrics (the .afm files) need to reside in your R's fonts location (which happens to be /Library/Frameworks/R.framework/Resources/library/grDevices/afm on my system) or you have to specify full paths to each of them. The second is that the encoding name specified in the Type1Font() function call is the terminal encoding, not the font encoding. Actually I'm not sure that there is such thing as "font encoding" at all (I've checked the PSCyr fonts with FontForge and they seem to be in CP1251) and I was lazy enough to dig into this issue.

The font name used here, "TimesNewRomanPSMT-Regular", comes from PSCyr files: you can look into any of the .afm's and see "FontName TimesNewRomanPSMT-Regular" inside.

So at this point we've constructed the font object and stored that in variable RU. Now we have to associate a name with the font. This is done via:

> pdfFonts("RU"=RU)
Here we associate a name, "RU", with our font object.

Now we are ready to use the font in plots, like this:

> pdf("russian.pdf")
> plot(1, 1, family="RU", main="Заголовок")
> library(grid)
> grid.text("надпись", gp=gpar(fontfamily="RU"))
> dev.off()
You can see that there are different ways to specify the font family in plot() and grid.text(), but in both cases we use the name that we've assigned to our font.

At this point the PDF we've created contains a reference to the font (via the name, "TimesNewRomanPSMT-Regular"). So in order to view the file we need to have the font installed in the system (as it's not one of PDF "standard" fonts). For internal use it's probably ok, but to distribute the PDF we have to embed the font into the file. R has a command specifically for this purpose, which we use like:

> embedFonts("russian.pdf", fontpaths="/Users/vss/lib/fonts/pscyr/type1")
Here the first argument is the file to process and the second is a path to the font files. Make sure the path contains actual font data, e.g. .pfb files, not just .afm's.

Under the hood the embedFonts() calls Ghostscript interpreter, so I had to install one. I used ghostscript 8.56 ("make all install" worked just fine). Be sure to install ghostscript fonts as well (otherwise it fails with cryptic error messages about not being able to find "Helvetica" font family).

That seems to be it. Hope it'll save time for someone.