OT: wondering about longterm archiving

Posted by dcouzin 
OT: wondering about longterm archiving
December 04, 2012 01:22PM
A digital video file is a bunch of 0's and 1's. Let's assume that the file data can be preserved for a very long time. Hard disk drives have short lives, but they can pass the data on to other storage media. I'm wondering what format (codec and container format) to make a digital video file today so it has the best chance of being readable 20, 50, or 100 years from now.

I believe it should be an uncompressed codec. Uncompressed codecs are transparent for decoding. In fact if you display the hexadecimal file for 8-bit uncompressed 4:2:2 video on a wide enough monitor so the rows of data correspond to the rows of pixels, you can often make out the image on the monitor. 10-bit uncompressed codecs are only a little less transparent. A 1999 Apple Developer piece "Uncompressed Y'CbCr Video in QuickTime Files" documents all of this.

Compressed codecs come and go, and they are full of tricks. Even if the decoders for such codecs are fully publically documented, I pity the future programmer needing to implement one many years later.

Perhaps the uncompressed codec should be R'G'B' rather than Y'CbCr, so the future programmer doesn't have to fuss with the matrix conversions, although these are publically documented in BT.709. Apple's codec "None" is 8-bit uncompressed R'G'B'. (I haven't checked the color quality of QuickTime's transcodings from modern Y'CbCr codecs to old None.) Being 4:4:4 saves the future programmer a step versus 4:2:2.

If an uncompressed digital video file were just a sequence of uncompressed digital still image files, the archiving would be pretty sure. The video container format can mess it up by muxing in the soundtrack. I found even a silent video in .mov format having mysterious data strings stuck in between certain frames. The container file's header or trailer must give demuxing or playback instructions. Whether the container format interferes with the archival value of the file depends on how tricky it is, and how well documented. My inclination is to archive the video with a container format that sticks nothing between the video frames. Is there one like this? Also the audio can be archived separately.

Dennis Couzin
Berlin, Germany
Re: OT: wondering about longterm archiving
December 04, 2012 02:49PM
QT and RGB formats seem pretty erratic. There's the infamous gamma shift and pixelation with certain formats of QT which seems to depend on the decoder/encoder. "None" seems cool until you realize it's only 8 bit RGB with that erratic decode/encode in QT. I think it plays ball with some softwares, just that I don't trust QT with RGB formats. I've had nothing but bad experiences with QT and RGB encoding, often preferring to do a decode to 10 or 8 bit Uncompressed 4:2:2 to get around the stupid gamma shift and pixelation issues that I think is tied up with a legacy encoder/decoder.

You could try looking at some of the more open RGB/img sequence formats like Jpeg2000 or DPX or OpenEXR but I won't recommend doing the conversion from FCP7 or QT. DPX is uncompressed RGB with header info that stores time code, and open exr let's you store up to 16 bit half float (not sure if time code is now a universal implementation), but these are overkill for compressed HDTV formats IMO, and they will need quite a lot of storage. DPX is the existing standard for DI workflows (which I'm sure you're familiar with), and Open EXR with ACES color space is touted as an emerging format to replace DPX because half float is pretty cool. J2k is an industry standard that is used by the motion picture industry for delivering DCP, also the licensing isn't tied up by the MPEG-LA. J2K uses DWT instead of the commonly used DCT compression. Avid has implemented J2k support in MC6.5.

Here's a plugin for encoding/decoding jpeg2000 in the Adobe apps:
[www.fnordware.com]

The plugin is kind of cool because it's compressed but you can playback in real time with today's machine. Downside is that this plugin truncates superwhite values, or it could be due to the lack of proper color space transformation support, but I don't know any RGB formats that preserves superwhites/sub blacks commonly found in Y'CbCr formats. But otherwise it seems to have wide format support, so you won't be stuck with something that doesn't work years later.

Here is a paper from INA, the French archive repository on Jpeg2000:
[www.fiatifta.org]



www.strypesinpost.com
Re: OT: wondering about longterm archiving
December 05, 2012 09:16AM
strypes, you provided great information. My preference for uncompressed codecs is based on a fantasy of digital archeologists finding files with almost no documentation. Which digital files will be almost as "transparent" as old cine reels?

It does seem more realistic to piggyback on the choices of archivists. When the INA (Institut national de l'audiovisuel) says "open industry standards, recognized and used for archiving applications (compression and encapsulation format)", why should I have to worry about them? If there are billions of archived JPEG2000's how could future generations not keep the decoding knowledge alive? I wonder.

I'm not sure why the INA speaks of 4:2:2 when JPEG2000's are usually R'G'B'. I think INA is using 10-bit R'G'B'. Then full decoding, through to display, requires specifying the gamma (or similar function) and also the R, G, and B primaries. The way we now specify primaries is to give their CIE chromaticity coordinates x,y. They're from a very old (1931) CIE system which should survive for just such legacy purposes. The DCI (Digital Cinema Initiatives) JPEG2000 coding skips R'G'B' and is simply X'Y'Z', which after the specified gamma (2.6) become CIE X, Y, and Z. X'Y'Z' is archivally preferable to R'G'B' if only because different videos, from different eras, used different Y'UV, Y'CbCr or R'G'B', and different gammas, which can all be represented in X'Y'Z' with gamma 2.6.

One approach to longterm archiving is file simplicity and transparency, inelegant and bulky as may be. The other approach is standardization, with a great mass of similarly archived works compelling survival efforts. Standardization has been an elusive goal with video. JPEG2000 coding has the great advantage of also being a still photographic standard.

DPX strikes me as too open a structure for longterm archival value.

Dennis Couzin
Berlin, Germany
Re: OT: wondering about longterm archiving
December 05, 2012 12:29PM
People think that film is different because film has been around for a long time, and the film itself can be kept forever when looked after properly. I think we're at the age where storing on film is like storing a program on U-Matic tape, not in terms of fidelity, but in terms of U-matic decks being accessible technology. It is hard to find a working U-Matic deck today. And then you have the issue of maintenance of parts, not being able to find a tape cleaner, etc... It is the essentially the same problem as video formats.

This is Thelma Shoonmaker on film archives:

[www.theatlantic.com]

DPX... Not sure what you mean by "too open". I haven't had problems reading DPX files. Sometimes the color space is not properly indicated in the file headers, so the file may say that it is linear, when it means log. Proper notation of the color space should suffice.

The problem is that Jpeg2000 is a lossy format at least at specific bitrates used for archival. There is quasi transparency, but it is not lossless, except at very high bitrates.

But I am looking at using Jpeg2000 as an intermediate codec in place of ProRes, because it is cross platform and open standard. And also it's DWT, not DCT. But I'm trying to find a way to preserve superwhites/subblacks and color space, and hopefully I can find an encoder/decoder for OP1A MXF in PPro so as to bypass QT related problems.



www.strypesinpost.com
Re: OT: wondering about longterm archiving
December 05, 2012 06:50PM
I quit film 20 years ago when Kodak euthanized a stock I liked.

Film was a fragile medium in two ways. Perforated flexible bands with transparent images are mechanically and chemically fragile. Also the generation-to-generation color transforms were very complex while the means of controlling color were very limited.

But an archeologist finding a non-disintegrated B&W film could photograph it frame by frame and restore it, likewise a color film (provided none of its dyes have completely faded). The restorations will be incomplete due to wear and tear, but like classical sculptures missing heads and arms they will still allow appreciation and understanding.

This works because film is barely coded. The step from small transparent image to large luminous image is obvious. The step from digital video file to display can't be that obvious, but some codecs come closer than others. With some, if the digital archaeologist finds the file nearly complete, and the documentation nearly comprehensible, he can make something of the video.

Freestand and piggyback are competing strategies for longterm archiving. The freestander aims for independence from standards. The piggybacker looks how "8½", "Stalker", "Tokyo Story" -- name your own favorites here -- have been coded for archive, and copies this exactly.

Dennis Couzin
Berlin, Germany
Re: OT: wondering about longterm archiving
December 05, 2012 07:18PM
We're not exactly creating a standard, but selecting an archival format. Adopting one in common use has certain perks, like the format won't die out without an escape route unless the entire species is somehow driven into extinction. You can perhaps look at the HDCAM SR Lite codec, but that is proprietary and owned by Sony.



www.strypesinpost.com
Re: OT: wondering about longterm archiving
December 05, 2012 08:05PM
Quote
strypes
DPX... Not sure what you mean by "too open". I haven't had problems reading DPX files. Sometimes the color space is not properly indicated in the file headers, so the file may say that it is linear, when it means log. Proper notation of the color space should suffice.

For a digital video file to be readable 20, 50, or 100 years from now, it, and all its parameters, must make sense then. Leaving color space wide open means DPX has too many parameters for archival use. DPX will disappear as soon as filmstocks disappear, and what will be left will be the imperfect file headers and imperfect documentation. Unused standards don't survive. BT.709 itself is very poorly written, really screwy. But as long as it's in use there's a community of interpreters which give it its meaning. The 1931 CIE color standards survive because they're in use and also because every generation or so CIE writes them up anew, in newer language with newer understanding.

Quote
strypes
The problem is that Jpeg2000 is a lossy format at least at specific bitrates used for archival. There is quasi transparency, but it is not lossless, except at very high bitrates. But I am looking at using Jpeg2000 as an intermediate codec in place of ProRes, because it is cross platform and open standard.

ProResHQ is strikingly good as an editing codec because of its behavior through multiple generations. A codec for archiving only needs visual fidelity through one generation. You might get an estimate of what degree of JPEG2000 compression is good for one generation from the DCI requirement: "For a frame rate of 24 FPS, a 2K distribution shall have a maximum of 1,302,083 bytes per frame". That's 250 Mbits/sec. For HD frame size it would be 234 Mbits/sec. Then figuring 10-bit rather than 12-bit, it's 195 Mbits/sec. Finally, X'Y'Z' coding isn't especially efficient, or suited to 4:2:2 subsampling. But the Z' is a blue record which can safely get half the bitrate of the X' and Y', and probably does in the DCP. The additional savings with JPEG2000 Y'CbCr 4:2:2 should get the bitrate down to about 163 Mbits/sec. Not peanuts, and with no assurances for multiple generation reencoding. ProResHQ is 176 Mbits/sec.

Quote
strypes
I'm trying to find a way to preserve superwhites/subblacks and color space...

If FCP's Proc Amp worked correctly you could preserve superwhites/subblacks by first linearly compressing Y' into the 64-940 video range. This compression should be harmless in 10-bit. If you're really going to make tens of thousands of JPEG2000's, then Photoshop should be able to batch color profile them to your satisfaction.

Dennis Couzin
Berlin, Germany
Re: OT: wondering about longterm archiving
December 06, 2012 07:15AM
ProRes isn't cross platform and some softwares can't write to it, while some others experience chroma/gamma shift issues. I haven't seen much comparisons of J2K vs ProRes except in a Grass Valley white paper. Apple dismissed J2K in their own ProRes white paper saying that it is processor intensive. I don't think J2K compares unfavorably against ProRes as many formats use wavelet based comparisons, including R3D and Cineform. Cineform has a white paper about how their codecs compare against HDCAM SR, although they are the vendor of their own product. The EBU did have a paper on J2K as well, but they were comparing it against H.264, at lower bit rates than what I was hoping to use.

I don't want to fiddle with knobs too much (eg. Setting a gamma alteration to pull it back down later), and prefer the softwares to handle them with their in-built LUTs. But right now it's still not too practical as I can't find OP1A MXF support for it except maybe in the last MC6.5 upgrade.



www.strypesinpost.com
Re: OT: wondering about longterm archiving
December 06, 2012 03:36PM
I wouldn't dream of archiving in ProRes. Good as it may be, it will blow away. Apple is Irresponsibility Inc.

Some basic considerations on Fourier (including DCT) image compression suggest that first-generation visual fidelity and multi-generation visual fidelity are different goals. That is, for a given bitrate, you'd design different codecs for the two goals. I don't know whether this also holds for wavelet image compression. Cineform writes as if it doesn't.

Cineform used a DCI-produced sequence for comparing Cineform444 with HDCAM SR. Certainly DCI used this sequence for judging its chosen JPEG2000 compression. Does Cineform anywhere show how their Cineform444 compares with JPEG2000?

DCI keeps the X',Y', and Z' JPEG2000's separate. You really have three JPEG2000's per frame, which is not efficient but probably good for longterm archiving.

Quote
strypes
I don't want to fiddle with knobs too much (eg. Setting a gamma alteration to pull it back down later)

Oh, I wish we had the knobs. FCS's gamma controls are a puzzle and FCP's Proc Amp is a disaster. Which video software companies are perfectly straight here, and which can be trusted to bridge between different companies' codecs/formats?

Dennis Couzin
Berlin, Germany
Re: OT: wondering about longterm archiving
December 08, 2012 02:02AM
>That is, for a given bitrate, you'd design different codecs for the two goals. I
>don't know whether this also holds for wavelet image compression. Cineform
>writes as if it doesn't.

The EBU talked about production/archiving codecs...

"Obviously, this (spatial) shift makes the task of the coder more challenging, especially for those algorithms based on a division of the picture into blocks (e.g. NxN DCT block), as in any later generation the content of each block is different to that in the previous generation."

[tech.ebu.ch]

Thing is they were testing J2K at 100Mb/s. I'm wondering how it would fare at 145Mb/s or 185Mb/s against DNxHD or ProRes.



www.strypesinpost.com
Re: OT: wondering about longterm archiving
December 08, 2012 11:20PM
strypes, we're back to October 2010, when we looked at that EBU report "HDTV production codec tests" and did some tests of the ProRes family. Unfortunately that EBU report describes only the test methodology, because "it was agreed between the EBU project group and the vendors to make the reports about the test details available to EBU Members only." Those details (results) would be yummy. Is there an EBU member in this forum?

The lengthy discussion of methodology is fascinating, but it's like reading a mystery and finding the last chapter torn out. The extremely summary conclusions in EBU Recommendation 124 don't fit the story. The way they recommend 100 Mbit/s hints higher: "should not be less than 100 Mbit/s". They make no distinctions among the competing codecs, despite testing 10-bit DNxHD at no lower than 185 Mbit/s and testing 10-bit JGEG2000 at no higher than 100 Mbit/s.

DCI's quality studies led them to, in effect, JPEG2000 with 195 Mbits/s for 10-bit X'Y'Z' HD 24p. When recast as 4:2:2 Y'CbCr, as used by the EBU, the corresponding DCI video bitrate would be about 163 Mbit/sec.

How big are JPEGs? I chose a random frame of ProRes HD, not a pristinely detailed image such as the EBU and DCI used for their tests. QuickTime Conversion make its highest quality JPEG from the frame 2.12 MB. (I don't know if it is 10-bit or 8-bit.) Twenty-four of these would make 427 Mbits! Using my Photoshop JPEG2000 plugin, the highest quality lossy JPEG2000 made from a bitmap of the same frame is similarly 2.01 MB (corresp. 405 Mbit/sec). It is more telling to use the quality scale of Photoshop's JPEG converter. "Maximum" quality 12 makes it 1.38 MB (corresp. 278 Mbit/sec). "Maximum" quality 11 makes it 984 KB (corresp. 183 Mbit/sec). "Maximum" quality 10 makes it 730 KB (corresp. 144 Mbit/sec). "High" quality 9 makes it 535 KB (corresp 105 Mbit/sec). The numbers would be higher for a better quality frame.

Dennis Couzin
Berlin, Germany
Re: OT: wondering about longterm archiving
December 09, 2012 12:29AM
I agree with you on that particular EBU paper. And I'm curious about test results between J2K and DNxHD/ProRes at the same bit rate.

If you have Premiere Pro or After Effects, I'd suggest using the fnord plugin that I mentioned earlier. There is quite a bit more control for color space and bit rate in that plugin. And no, I don't believe QT will get you a clean 10 bit J2K image.



www.strypesinpost.com
Re: OT: wondering about longterm archiving
December 09, 2012 02:11PM
I'm stuck with exporting bitmaps (8-bit) from FCP and working on them with Photoshop. I did first try fnord's j2k.8bi plugin in Photoshop. It made an inscrutably small JPEG2000 from the bitmap, so I switched to Photoshop's own plugin for the experiment.

Dennis Couzin
Berlin, Germany
Re: OT: wondering about longterm archiving
January 16, 2013 07:15PM
I've been investigating the same issue. While JPEG2000 seems to be favoured by the Library of Congress, I haven't been able to find open-source software that reliably encodes it. Also, it is at least partially proprietary.

There is an alternative codec - the open-source ffv1, developed by the makers of ffmpeg. It has the advantage of being able to deal with a range of bit-depths and chroma subsampling schemes, and is being used by the Österreichische Mediathek (Austrian film archives) and the City of Vancouver, among others. It's well supported by ffmpeg (obviously) which makes it easy to include in other software.

Discussion on ffv1 can be found on the association Of Moving Image Archivists mailing list here.
ffmpeg (which includes ffv1) is freely available here.
Re: OT: wondering about longterm archiving
January 16, 2013 10:32PM
From a general archival viewpoint, introduction of any new tricky, as opposed to transparent, codec is not good news. With some archivists using JPEG2000 and some other archivists using ffv1, etc., there is increased risk of some archived work not being readable 100 years from now. In fact, there is increased risk of all archived work not being readable 100 years from now. The more work that is archived in a any particular form, the more chance that that form will be upkept, somewhere.

Österreichische Mediathek uses funny reasoning to justify their first axiom: that the archive format must be mathematically lossless.
Quote

... as the content is expected to last forever, it must undergo endless instances of conversion into future formats. Therefore each loss in quality, even if it is minimal, would - at the end of the migration chain - lead to a total loss of content.
First of all, the original archiving can be lossy, to a reasonable degree, and then it is the responsibility of later archivists, converting it into "future formats", to work losslessly. Second, this unmathematical thinking misses the point that compression schemes that weaken high spatial frequency detail can be applied ad infinitum without killing more than the high frequency detail.

Granted, the Library of Congress has chosen lossless JPEG2000 for archiving moving images. A good reason for this is standardization itself.

I disagree with another decision of the Österreichische Mediathek.
Quote

Different video formats deal with different colorspaces. Because converting of colorspace implicates some inevitable losses, it should not be altered. Therefore the file codec must be able to map all eventually occuring colorspaces, including their subsampling options.
A better archival solution is to represent each video in the all-encompassing color space allowed in the DCI specification. This requires making decisions today, while the imperfectly documented color spaces used to date are more or less known. Consider, for example, a video of today made to BT.709 standard. We know where the display primaries are supposed to be, so we know what triangular gamut will be staked off within the all-encompassing color space. We know how BT.709 says to convert Y'CbCr to R'G'B', and the rounding losses are trivial in 10-bit and above. But we don't know exactly what the display gamma is supposed to be: how the R'G'B' will be converted to the RGB (which drive the primaries). Some think the gamma is 2.22, but the EBU declares that incorrect and recommends 2.35. The difference is significant. The archival video format will be in R'G'B' or in X'Y'Z', and the format should specify one definite gamma (or similar function) for removing those prime signs. So the archivist today must make a decision whether the BT.709 video requires gamma 2.22 or 2.35 or whatever, and he must code the video so it plays back correctly using the gamma specified for the archival format. The archivist today who sloughs off the BT.709 playback gamma uncertainty onto archivists 100 years hence is irresponsible. With all videos made to date the archivist today must dig into recent history and make decisions about color space. With motion pictures, assuming he is working from a good projection positive, the archivist only needs to decide what kind of lamp was intended for projection (and perhaps what lens flare was typical).

Videos and films each had/have their intended or conventional display appearances. For color videos and films (except Lippmann images and the like) color is color. We can and should recode it in a uniform way for archival preservation, so the intended or conventional display appearances can appear again.*

I disagree with the archival approach that makes moving images more special than they need to be. All moving images, so far, have been sequences of still images. Archiving is most robust, most likely to survive, when a single codec is used for the two. The color space issues for moving images are messy but not deep, and should be dealt with at the time of archiving. The color issues for non-digital still images are actually deep, because there may be multiple intended light sources.

Dennis Couzin
Berlin, Germany

*Note added 18 Jan: This forces adoption of the all-encompassing color space since there are color print