Tech

De-mystefying secure and validated copy processes

Gunleik
October 12, 2023
Stian Zejlko Vrba

Our CTO Stian Zejlko Vrba has a PhD in informatics and works on math problems in his spare time. He has a relentless and pretty binary relation to truth.

Statements are true, or they are not.
And this attitude is further amplified in his relation to technology

The first three years of Quine’s existence, he successfully convinced us co-founders that we cannot work on an end-user product until we have created a principal and solid infrastructure solution to the problem.

It was a bit crazy: Who would fund a company with a product without a UI?

No-one, actually – but the initial pain we experienced, has paid off in volumes. As to this day we haven’t lost a single file in any transaction, being it local transactions or transactions that go through the internet or local networks in our commercial product – QuineCore, our Proof of Concept product: GAMP or in the newly released QuineCopy.

Copy-errors happen. Not often, but a cable, a switch, computer memory, a card-reader or a disk can be faulty, and then Secure, Verified Copying – which is part of our proprietary CopySafe™ technology, is there to alert you. It's at the base of all we do.

Like most people riding a motorbike, hopefully you won’t ever need it, but if you crash, it can be the difference between life and death.
Don’t mess with the helmet!

Secure Verified Copying is not magic. In fact, outside the media-industry, it is a commodity infrastructure-tech which runs in the background for most any file-or metadata transaction, to ensure consistency at all destinations in a distributed reality. Your phones, websites even connected household appliances depend on secure, verified copying to function properly.

Somehow the media industry is an exception to that, especially at the lower end.

In our industry, we use emotional descriptors, even on tech.
We say things like “with this camera I can recover 3 stops of highlights from the shot”, which sounds like magic, but in reality is just about rearranging bits and bytes so that data that already is in the file become visible.

Can you imagine Stians reaction - who had years of experience in precisely moving and keeping data-consistency in enormous mission-critical datasets in industrial applications, the first time someone said to him:
Do we checksum our files?
That silly unknowing someone was in fact me...

He had to keep a slight rage and despair when taking in my lack of understanding, though  he took the time to teach me.

Secure copying is a procedure that includes validation.
Checksumming is a tool to validate that procedure, if the procedure is followed rigorously
And if the procedure is not followed rigorously, your files are not securely copied or verified, no matter how many hash’es you write to an MHL file or a receipt. Without the proper procedure, the checksums are at best misleading, at worst they create a false idea of security.

How little did I as a DoP and workflow supervisor know about the tech we all used every day at the time…

The interesting thing about procedures is that they take time. If you cut corners to save time, you haven’t followed the procedure – which in this situation would lead to a not securely copied and verified piece of data.

You either have a verified file, or you don’t have a verified file.
You never ever have a somewhat verified file.


A quick definition of verified copy in the legal space:

A verified copy is a duplicate of an original document that has been certified as an exact reproduction by the person responsible for the original.

The key element which is also relevant to us, is that all copies made needs to be compared to the original before you state that the copy is good.

Fast forward 6+ years. We decided to launch QuineCopy at IBC2023 back in January this year.

What a “secure and validated copy” really is, is as wishy-washy in the media field as it was 10 years ago.


People love to discuss md5 vs XXHASH64, but very few know exactly why these are important or how they are actually used to secure your files.
Users open their MHL-files and see a bunch of UID-like letters and numbers attached to their receipt and thus feel content, because:
“The files have been hash’ed, right?”
And technically that is probably true, but that does not necessarily make it true that you have a secure and validated copy.

There are a few ways to spot those who cut corners, but then you first need to understand the procedure:

Step 1

You read the file from a source. The data you read is hashed, and simultaneously written to one or more destinations.

At this stage you can also choose to read back what is written to the destinations, generate a hash and compare it to the hash generated from the source data. If you do this, you know that you have written an exact replica of what you read.

Many seem to think that this means that the file is identical on the source and the destination, but the truth is you don’t know that yet.  
All you know now is that what you wrote is equal to what you read. 

Considering it is in the reading-stage that most faults are introduced - we have all experienced faulty RAM, a flaky USB  cable, an overheated card-reader, a bad network cable and so on at some stage in our lives, even outside the professional media-management space.

You risk ending up just verifying that you have dutifully replicated a faulty file.

Which is suboptimal.


Thus
Step 2

You need to re-read and re-hash the file from your source . If the hashes from the first and second read of the source file matches, you can assume your initial read was not faulty.
Even then it is not necessarily 100% true that no errors are present. It is just very very unlikely to be errors in the transaction. Having the exact same bits being corrupted twice due to a common hardware fault is exceedingly unlikely. If the hash from the destination files also matches the source hashes, you know the data you have written is identical to the source.

As you may infer from the above procedure, there are some physical limits to how fast all of this can be done, independently of what hashing method is used.

The hashing itself doesn’t necessarily add a lot of time to the process. Step 1 without the reading and hashing on the destination, should be somewhat like a finder or explorer copy. But if your preferred solution for copying files report to you that you are done at this stage, you are being duped. You don’t yet know if your initial read was faulty. A full second read of source material is required.

There are in fact a couple of ways you can spot if this is the case.

1st : the process goes as fast as your finder/explorer copy. There is no physical way that this can result in a truly validated copy.

2nd: If you look into activity monitor (Mac) or Task Manager (Windows) and see that your tool creates a lot of disk activity after reporting that the file is verified, the application probably just confuses that it isn’t done yet with anything except the first step of the process.

These are indicators your secure copy, probably have not been secured at all.

Some paid software does  not at all even attempt to validate  copies of assets to network-drives (NAS). This is... not good. Network transactions have a much higher probability of packet-loss (meaning non-good transaction) than a local copy by nature.

You can mostly through watching task-manager (Windows) or activity monitor (Mac) and by just comparing the time some software uses to give you a heads-up that your copy is safe to what your computer manages to do through a system copy, get a very strong hunch of whether your software is indeed safe or not. Unfortunately "approved by" some external authority, is not enough.
We would recommend you test with files that have a larger size than your computers RAM, thus:
If your computer has 32GB of RAM, the files you test with should be larger than 32 GB

In  Quine, our developers are scientists and not from the film-industry. They are used to experiment to get the best results, and also to find out what is going on if something is too good to be true.

So when the number don't add up, we tend to test what is wrong.

In our labs we can du simple experiments to test our own, but also other software's behaviour, like altering the content of a file that has been copied to see if the mistake is being picked up.

For non-developers that is not trivial. For us, these tests are the air we breathe and what we stake our reputation on.

The moral of all this is:
There is popular software out there that should not be trusted.

Users should be able to themselves validate if they should feel safe or not, and an MHL file is unfortunately not a reason to feel safe.

Please run similar tests on QuineCopy and QuineIngest and if you see misbehaviour, or manage to make a bad copy without us reporting it, shout it out.

Most importantly:
If something is too good to be true, it probably is.

Secure copying is a procedure, not magic, and if that procedure is not followed, your files are not safe.

Are there no ways to optimize the process?

There are many, in fact – most are related to how you deal with multiple files, multiple destinations and queueing of tasks.  If you have the speed in your setup,  XXHASH64 is significantly faster than md5, but also less secure.

Unless the programming-work is sloppy, the process itself should not create too much overhead per file and one application should be pretty similar to another.

Instead of complaining about an industry where there is a lot of “opinions” on technical truths, we decided to get QuineCopy out there to make secure copying a commodity in the media industry, too.

The typical software company in our field is started by someone like me - an artist, who see a problem and teach myself a bit of basic programming skills to try to solve it and then sell it.

Quine is a bit of different animal.

We first went through 2 years of Horizon 2020 research with partners like broadcasters, universities and Technicolor and created a POC product before establishing the company, and when deciding to  start the company, we  joined another 2 years of research on file- and metadata management in real-time on-set with the same partners while silently researching and developing principal technology and data models for automated file-and metadata transactions on-set, remotely and locally.

It took us nearly 3 years to get to the point that the data models and solutions had a UI, but since then, the system has been rock solid.

And like Stian:
We hate to lie.

Happy securely copying everyone.

Gunleik

------
Edits
:
2023-10-18 
Spelling and signature

Added link to explanation of what a hash is and what it can be used for, outside listing it in an MHL table
Added link to MediaHashList.org

Book a demo with us.

Experience the power of Quine yourself.

Book a demo