-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 El 2007-09-13 a las 19:03 +0200, miguel gmail escribió:
electrónica, que es precisamente el caso. Es un formato para archivado que compite con el PDF. Puede también incluir una capa de texto en formato texto sacado del OCR.
Me falta una cosa. Qué quieres decir con 'puede incluir una capa de texto en formato texto sacado del OCR??
Es un formato que contiene varias "capas" de datos. Puede tener una capa de blanco y negro con las letras (que es la que se envía antes por web) de una resolución suficiente para ser legible con facilidad, y otra capa a otra resolución con los colores. Me parece recordar que el color de fondo también puede ir aparte (el color del papel). Y puede tener otra capa (fichero incluido, si prefieres decirlo así) que contiene el texto en formato texto, con diversas utilidades, como indexado, archivado, busqueda electrónica de un docuemento que contenga cierta frase... Ese "capeado" es el que soy incapaz de generar. No se como hacerlo con las herramientas libres. Y se puede, sin embargo. Todo esto seguro que lo explica en los enlaces que puse; yo hablo de memoria y tendré imprecisiones y errores. Pe, Wikipedia: DjVu (pronounced déjà vu) is a computer file format designed primarily to store scanned images, especially those containing text and line drawings. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal images. This allows for high quality, readable images to be stored in a minimum of space, so that they can be made available on the web. DjVu has been promoted as an alternative to PDF, actually outperforming PDF on most scanned documents. The DjVu developers report that color magazine pages compress to 4070KB, black and white technical papers compress to 1540KB, and ancient manuscripts compress to around 100KB; all of these are significantly better than the typical 500KB required for a satisfactory JPEG image. Like PDF, DjVu can contain an OCRed text layer, making it easy to perform cut and paste and text search operations. ... DjVu divides a single image into many different images, then compresses them separately. To create a DjVu file, the initial image is first separated into three images: a background image, a foreground image, and a mask image. The background and foreground images are typically lower-resolution color images (e.g., 100dpi); the mask image is a high-resolution bilevel image (e.g., 300dpi) and is typically where the text is stored. The background and foreground images are then compressed using a wavelet-based compression algorithm named IW44. The mask image is compressed using a method called JB2 (similar to JBIG2). The JB2 encoding method identifies nearly-identical shapes on the page, such as multiple occurrences of a particular character in a given font, style, and size. It compresses the bitmap of each unique shape separately, and then encodes the locations where each shape appears on the page. Thus, instead of compressing a letter "e" in a given font multiple times, it compresses the letter "e" once (as a compressed bit image) and then records every place on the page it occurs. In 2002 the DjVu file format was chosen by the Internet archive as the format in which its Million Book Project provides scanned public domain books online (along with TIFF and PDF). DjVu format will be used by the One Laptop per Child project in order to easily supply existing paper books in an eBook format. The advantage of DjVu is that it is highly compressed and it does not require any font support. [1] ... PDF is most useful when the original source is an electronic document such as a Microsoft Word doc or TeX file. Such documents benefit most from the vector graphics technology that underlies PDF. DjVu files can be marginally smaller but only deliver a high resolution image, possibly enriched with the associated text. DjVu is very good for image files, and has especially been optimized for scanned text and images. If one has a set of scanned pages from a book or article, DjVu is superior to PDF. However, PDF could be better if the scanned raster images can be transformed into high quality vector graphics, for instance by applying optical character recognition to the scanned image, identifying the fonts, and carefully proofreading the resulting file. This procedure is often undesirable or time/cost prohibitive. Suitable fonts might not be available, or one may want to preserve the original document more exactly, including signatures, marginal comments, paper texture, or other markings. In such cases, DjVu is the better choice. - -- Saludos Carlos E.R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFG6YOWtTMYHG2NR9URAi5+AKCIptpoo1Pa5UcEzw1o7xFh7a6PpgCcDSbR +ySRLjp9R/rMVhskK6lplaY= =Xh8z -----END PGP SIGNATURE-----