Friday, May 10, 2013

Removal of GPLv2

Thanks to John, who pointed it out that the additional term is not compatible with GPLv2. Also as I have checked again, FontForge is now released under GPLv3+, so now (most parts of) pdf2htmlEX is released under only GPLv3 (with additional terms).

Licence Changed

Recently I'm changing the license of pdf2htmlEX, but most of you should not be worried.

As you know pdf2htmlEX is released under GPLv2 or GPLv3, with a few files released under the MIT License. GPL does not protect the source code for usage in online services, as AGPL does.

I don't think it would be necessary to apply AGPL, since a wise service provider should have realized that making their modifications public is an advantage to themselves, and indeed I've received serveral patches from service providers.

Unlike most GPL softwares, pdf2htmlEX is designed for service providers. I expect the common use to be customization for different services instead of redistributions. But I don't want it to end up with lots of wrappers without any feedback. So in order to let more people know about this technology, and to attract more feedbacks, recently I added a new term in the license:

If you want to use pdf2htmlEX (or your modified version) in your online service,
through which a user can provide one or more files through a computer network,
and view any part of the result produced by pdf2htmlEX (or your modified version) on the file(s) provided by the user, you should credit pdf2htmlEX with a proper link to in the page where the result is presented, or the homepage of your service, or a page directly accessible from the homepage of your service.
Any derivate works should also include this term.
For example, you should credit pdf2htlmEX if pdf2htmlEX (or your modified version) must be called after a user upload some files and before the user can see the result of the files, you should credit pdf2htmlEX.

Here are a few explanations:
This term applies if your service involves "online conversion", which means the services allow users to upload files and view the conversion by pdf2htmlEX (or your modification). This terms do not apply if you convert documents of your own and present them online -- but still you are encouraged to credit pdf2htmlEX.

Three locations are mentioned in the term:
  1. The page where you present the converted document
  2. The homepage
  3. A page directly accesible from the homepage
The first two should be intuitive. But I may want a clean homepage if I were the UI designer, and I do not want to see the ugly logo of those document-embedding plugins, therefore I stated the 3rd one. I expect it to be the About page, the Acknowledgement page, or a page where you list technogies used in your service.

If you have done this, you are encouraged to send me the name of your service and a url to where pdf2htmlEX is creditted. This is for the purpose of statistics, and in the future I may create a list of 'sites that use pdf2htmlEX'.

I'm not a lawyer, and I don't know how this is achieved in other softwares. I just want to express my thoughts in this term, hopefully which is clearly explained in this post.

Please tell me what do you think about this additional term!

Monday, May 6, 2013

pdf2htmlEX v0.8.1 is out

Download here

If you download and install this version, `pdf2htmlEX -v` actually shows v0.9, this is due to my naive git branch model. Please contact me if this is too annoying to you. I'll fix it.

This is a quick fix for v0.8, except for `--optimize-text` is turned off by default. This parameter turns out to be still buggy, I'll try to fix it in the next release.

The next release will be focused on optimization, mainly about background images. The idea is to use a number of rectangle to cover occupied areas, instead of a big-whole image for each page. The second step would be combine the rectangles together, and use something like CSS sprite or CSS clipping in order to reduce the number of requests.


Sunday, May 5, 2013

pdf2htmlEX hits Top Trending Repos

As of May 6th, 2013, pdf2htmlEX hits the Monthly Trending Repos at GitHub:
It has also become the top daily & weekly trending repo. Didn't see this coming!

I realized that it might be necessary to start a blog sharing news, technical and non-technical stuffs about pdf2htmlEX. And here it is.

pdf2htmlEX (, just as its name, converts PDF into HTML. How does it work? Let the demos speak:

Many people wonder why they should ever convert PDF to HTML. A short answer is they should not, because they are viewers. While this tool is designed for publishers.

This is an era of Web. For many people, the Internet = the World Wide Web. When not at work, I rarely let my screen occupied by any window except for a browser. What else can you not do with a browser? I like web pages, they have become more and more elegant, but yet powerful (to use) and simple (to compose).

Despite of the development of HTML/CSS/JavaScript, what is your experience with reading PDF files online? Although PDF is always the first choice for any cases involving printing, and no need to mention LaTeX users. When you put an 'online' afterwards, I'd say terrible. Years ago, online PDF reading means
ugly, insecure, unstable and slow plugins that never releases my keyboard & mouse focus. And now browsers have started to implement their own built-in PDF viewers -- PDF is so popular, while the plugins are so not good, that Web browsers have to do this to comfort users.

Another thing I like in web pages, but not in PDF files, is about interaction, quick example: links, on Wikipedia, you may receive a rather smooth information flow while your cursor dancing among the links. Not to mention all kinds of CSS/JavaScript tricks that amaze you. The key is that everything is accessible. PDF, on the other hand, is more like a blackbox, or an <iframe>, it does have many features, but you (the hosting web page or the browser) never know what's going on inside.

This is not fair since PDF is never designed for this. But the idea is that the web technologies are powerful enough to render PDF files, and people need this -- see Crocodoc and SlideShare. pdf2htmlEX works as a bridge, and the target is turn 'Everything to PDF' into 'Everything to Web', just imagine:

  • Your careful designed resume can be published online with Google Analytics embedded.
  • Your slides can be shown online with all kinds of CSS/JavaScript eye candies.
  • PDF documents never make your web sites ugly.

Hopefully some day in the future, we will not be able to tell HTML from PDF by their appearances, just like we cannot tell JPEG from PNG. (or can you?)