Difference: FightEntropy (1 vs. 26)

Revision 2631 Oct 2017 - TobyCabot

Line: 1 to 1
 Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end Microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

Line: 80 to 80
 http://arstechnica.com/science/news/2010/11/preserving-science-how-data-gets-lost.ars - an article about this topic in the context of scientific research

http://blog.longnow.org/02014/02/24/iceisee-3-to-return-to-an-earth-no-longer-capable-of-speaking-to-it/ - a sad tale of a perfectly functional satellite having to be mothballed because we can no longer communicate with it.

Added:
>
>
https://xkcd.com/1909/ - XKCD has a funny perspective on the subject

Revision 2525 Feb 2014 - TobyCabot

Line: 1 to 1
Deleted:
<
<
 Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end Microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

Line: 79 to 78
 http://lwn.net/Articles/240528/ - a link to an article on this topic by Jeremy Allison of the Samba team. Some of the comments are interesting, too.

http://arstechnica.com/science/news/2010/11/preserving-science-how-data-gets-lost.ars - an article about this topic in the context of scientific research

Added:
>
>
http://blog.longnow.org/02014/02/24/iceisee-3-to-return-to-an-earth-no-longer-capable-of-speaking-to-it/ - a sad tale of a perfectly functional satellite having to be mothballed because we can no longer communicate with it.

Revision 2416 Feb 2013 - TobyCabot

Line: 1 to 1
Deleted:
<
<

Introduction

  Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end Microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

Changed:
<
<

Discussion

In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable (i.e. http://www.informationweek.com/story/IWK20010719S0003 ), or the format that the data is stored in becomes indecipherable. While both reasons produce the same result, they are different in terms of the actions that you need to take to prevent them.

>
>
In my experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable (i.e. http://www.informationweek.com/story/IWK20010719S0003 ), or the format that the data is stored in becomes indecipherable. While both reasons produce the same result, they are different in terms of the actions that you need to take to prevent them.
  Media I've used:
  • cassette tape
Line: 21 to 18
 
  • CDROM
  • DVD-ROM
  • USB Thumb drive
Added:
>
>
  • CompactFlash
  • SecureDigital cards
  Media I haven't used, but know about:
  • Hollerith punchcards
Line: 30 to 29
 
  • Zip drives (100MB, 250MB)
  • Jaz drives
Changed:
<
<
Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed. This happens frequently because the media and the machines that read it are physical devices and therefore age and degrade over time. A failure in either the media or the reader is enough to render the data lost forever, but if one floppy fails you lose only the data on that floppy but you lose it permanently. If the floppy reader fails then you lose access to all of the data on all of that type of media, but you can potentially get it back by finding someone else that has that type of reader and borrowing it from her.
>
>
Media can become unreadable if the media itself fails (magnets, scratches, click of death) or if the reader breaks and can't be fixed. The media and the machines that read it are physical devices and age and degrade over time. A failure in either the media or the reader is enough to render the data lost forever: if one floppy fails you lose only the data on that floppy but if your floppy reader fails then you lose access to all of the data on all of that type of media. You can potentially get it back by finding someone else that has that type of reader and borrowing it from her, but you don't want to count on that.
 
Changed:
<
<
The contents of the media can become unreadable, even if the media is readable, if there is no software that can decipher the format that the data was stored in. You can think of storing data in terms of encrypting it using an encryption algorithm. Some encryption algorithms are stronger (i.e. harder to decipher) than others, and some algorithms are more widely understood than others. The software that you use to read and write the data are the encryption key. If you understand the "encryption algorithm" that you're using to write the data then you will not have to worry about deciphering it, but if you don't understand the algorithm then you are dependent on that software to read it for you. My Mom was dependent on software that could read data encrypted in a certain Microsoft algorithm that neither she nor I understood, and over time Microsoft themselves "lost the key" to that algorithm so her data was permanently locked up and Mom didn't have the key.
>
>
Data can become unreadable, even if the media is readable, if there is no software that can decipher the format that the data was stored in. Think of storing data in terms of encrypting it using an encryption algorithm. Some encryption algorithms are stronger (i.e. harder to decipher) than others, and some algorithms are more widely understood than others. The software that you use to read and write the data are the encryption key. If you understand the "encryption algorithm" that you're using to write the data then you will not have to worry about deciphering it, but if you don't understand the algorithm then you are dependent on that software to read it for you. My Mom was dependent on software that could read data encrypted in a certain Microsoft algorithm that neither she nor I understood, and over time Microsoft themselves "lost the key" to that algorithm so her data was permanently locked up and Mom didn't have the key.
  This is the most important reason (among many) why you should never store any data in a format that's not well documented. If the only person that understands the format is the person or company that produced it, they can decide at any time that they don't want to support it anymore and you have very little recourse. You could try to figure out how the format works by looking at your documents, but the process (known as "reverse-engineering") is time-consuming and boring and may be illegal in some cases. Note that the key distinction is not whether the format is proprietary or non-proprietary, it's whether the designer of the format has provided enough documentation of it that other people can read and write it. Some proprietary formats, for example Adobe Portable Document Format, are very well documented so many tools can read and write them.

Revision 2316 Feb 2013 - TobyCabot

Line: 1 to 1
 

Introduction

Changed:
<
<
In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.
>
>
Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end Microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.
  What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

Revision 2215 Nov 2010 - TobyCabot

Line: 1 to 1
 

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

Line: 79 to 79
  http://lwn.net/Articles/240528/ - a link to an article on this topic by Jeremy Allison of the Samba team. Some of the comments are interesting, too.
Changed:
<
<
-- TobyCabot - 22 Feb 2002 - 05 Jul 2007
>
>
http://arstechnica.com/science/news/2010/11/preserving-science-how-data-gets-lost.ars - an article about this topic in the context of scientific research

Revision 2105 Jul 2007 - TobyCabot

Line: 1 to 1
 

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

Line: 77 to 77
  http://photoshopnews.com/?p=226 - this issue affects even cameras. Here's a story about an expensive camera that uses a proprietary format that can only be read by that vendor's software.
Added:
>
>
http://lwn.net/Articles/240528/ - a link to an article on this topic by Jeremy Allison of the Samba team. Some of the comments are interesting, too.
 
Changed:
<
<
-- TobyCabot - 22 Feb 2002 - 13 Oct 2004
>
>
-- TobyCabot - 22 Feb 2002 - 05 Jul 2007
 

Revision 2007 Oct 2005 - TobyCabot

Line: 1 to 1
 

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

Line: 41 to 41
 

Notes

Added:
>
>
In the summer of 2005 the Commonwealth of Massachusetts decided to use the OpenDocument document format instead of MS's proprietary formats. One of their keys issues was the ability to read documents for a very long time. Here's an analysis of that decision: http://www.dwheeler.com/essays/why-opendocument-won.html
 Dublin Core Metadata Initiative (http://dublincore.org/) offers standards for encoding many different types of data, for example http://dublincore.org/documents/dcmi-terms/.

Things that can be stored in standard formats:

Revision 1919 Apr 2005 - TobyCabot

Line: 1 to 1
 

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

Line: 73 to 73
  http://www.theregister.co.uk/2005/02/21/forgetting_digital_memories/ - Digital memories: we can forget them for you wholesale!
Added:
>
>
http://photoshopnews.com/?p=226 - this issue affects even cameras. Here's a story about an expensive camera that uses a proprietary format that can only be read by that vendor's software.
 -- TobyCabot - 22 Feb 2002 - 13 Oct 2004

Revision 1817 Mar 2005 - TobyCabot

Line: 1 to 1
 

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

Line: 41 to 41
 

Notes

Added:
>
>
Dublin Core Metadata Initiative (http://dublincore.org/) offers standards for encoding many different types of data, for example http://dublincore.org/documents/dcmi-terms/.
 Things that can be stored in standard formats:
Changed:
<
<
>
>
 
Changed:
<
<
Text wins over proprietary formats (see Project Gutenberg).
>
>
Text wins over proprietary formats (see IETF, Project Gutenberg).

Documented proprietary formats win over undocumented formats (e.g. RTF over DOC).

 
Changed:
<
<
Documented proprietary formats win over undocumented formats (i.e. RTF over .DOC).
>
>
Lossless wins over lossy (e.g. FLAC over MP3).
  Backup vs. Archive: short-term vs long-term, bulk data vs document-oriented.

Revision 1724 Feb 2005 - TobyCabot

Line: 1 to 1
 

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

Line: 16 to 16
 
  • 1/4 cartridge tape (QIC)
  • 5 1/4-inch floppy (CP/M, Apple ][, trs-80, C64, PC, hard sector, soft sector, single-sided, double-sided)
  • 4MM DAT/DDS/DDS2
Added:
>
>
  • TZ/TK/DLT tape cartridges
 
  • 3 1/2-inch floppy (720k, 1.44MB)
  • CDROM
  • DVD-ROM
Added:
>
>
  • USB Thumb drive
  Media I haven't used, but know about:
  • Hollerith punchcards
  • Paper tape
  • 9-track tape
Deleted:
<
<
  • TZ/TK/DLT tape cartridges
 
  • Bernoulli box drives
  • Zip drives (100MB, 250MB)
  • Jaz drives
Line: 66 to 67
  http://www.ietf.org/internet-drafts/draft-ietf-geopriv-dhcp-civil-04.txt - an IETF draft for "civic location," also has some good references
Added:
>
>
http://www.theregister.co.uk/2005/02/21/forgetting_digital_memories/ - Digital memories: we can forget them for you wholesale!
 -- TobyCabot - 22 Feb 2002 - 13 Oct 2004

Revision 1613 Oct 2004 - TobyCabot

Line: 1 to 1
 

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

Line: 64 to 64
  http://www.itl.nist.gov/div895/carefordisc/index.html - The US feds help keep your data safe.
Changed:
<
<
-- TobyCabot - 22 Feb 2002 - 25 Jan 2004
>
>
http://www.ietf.org/internet-drafts/draft-ietf-geopriv-dhcp-civil-04.txt - an IETF draft for "civic location," also has some good references

-- TobyCabot - 22 Feb 2002 - 13 Oct 2004

Revision 1524 Jan 2004 - TobyCabot

Line: 1 to 1
 

Introduction

Changed:
<
<
In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.
>
>
In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.
  What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.
Line: 35 to 35
  This is the most important reason (among many) why you should never store any data in a format that's not well documented. If the only person that understands the format is the person or company that produced it, they can decide at any time that they don't want to support it anymore and you have very little recourse. You could try to figure out how the format works by looking at your documents, but the process (known as "reverse-engineering") is time-consuming and boring and may be illegal in some cases. Note that the key distinction is not whether the format is proprietary or non-proprietary, it's whether the designer of the format has provided enough documentation of it that other people can read and write it. Some proprietary formats, for example Adobe Portable Document Format, are very well documented so many tools can read and write them.
Changed:
<
<
The most important set of undocumented proprietary data formats are the Microsoft Office formats for documents, spreadsheets, presentations, etc. These formats are important because so much data is encoded in them every day, but they are not documented, and they change frequently. Many people spend many hours reverse-engineering them, but this effort is frustrated when a new version of the formats appears and the reverse-engineering process must start from scratch.
>
>
The most important set of undocumented proprietary data formats are the Microsoft Office formats for documents, spreadsheets, presentations, etc. These formats are important because so much data is encoded in them every day, but they are not documented, and they change frequently. Many people spend many man-years reverse-engineering them, but this effort is frustrated when a new version of the formats appears and the reverse-engineering process must start from scratch. And no, the XML-based Office 2003 formats are not any less proprietary than the previous binary ones. For one thing, they're still undocumented, for another, they're probably patented so even if you were capable of figuring out how to read them it would be against the law for you to do so.
 

Notes

Line: 61 to 62
  wrjpgcom is a tool to write data to the comment field of a jpeg image file. Need to find the source.
Changed:
<
<
-- TobyCabot - 22 Feb 2002
>
>
http://www.itl.nist.gov/div895/carefordisc/index.html - The US feds help keep your data safe.

-- TobyCabot - 22 Feb 2002 - 25 Jan 2004

Revision 1419 Jan 2004 - TobyCabot

Line: 1 to 1
 

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

Line: 59 to 59
  http://www.ecommercetimes.com/perl/story/31436.html
Added:
>
>
wrjpgcom is a tool to write data to the comment field of a jpeg image file. Need to find the source.
 -- TobyCabot - 22 Feb 2002

Revision 1303 Sep 2003 - TobyCabot

Line: 1 to 1
 

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

Line: 57 to 57
  http://computerworld.co.nz/webhome.nsf/NL/A7D9D35CE6CC6DE3CC256D5F00728810
Added:
>
>
http://www.ecommercetimes.com/perl/story/31436.html
 -- TobyCabot - 22 Feb 2002

Revision 1215 Jul 2003 - TobyCabot

Line: 1 to 1
 

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

Line: 14 to 14
 
  • 8-inch floppy disk
  • Magneto-optical cartridge disk
  • 1/4 cartridge tape (QIC)
Changed:
<
<
  • 5 1/4-inch floppy (CP/M, Apple ][, trs-80, C64, PC)
>
>
  • 5 1/4-inch floppy (CP/M, Apple ][, trs-80, C64, PC, hard sector, soft sector, single-sided, double-sided)
 
  • 4MM DAT/DDS/DDS2
Changed:
<
<
  • 3 1/2-inch floppy
>
>
  • 3 1/2-inch floppy (720k, 1.44MB)
 
  • CDROM
  • DVD-ROM
Line: 55 to 55
  http://slashdot.org/article.pl?sid=02/03/03/1821227&tid=126 - BBC digitizes old book, 15 years later the digital version is useless but the 1000-year-old book can still be read.
Added:
>
>
http://computerworld.co.nz/webhome.nsf/NL/A7D9D35CE6CC6DE3CC256D5F00728810
  -- TobyCabot - 22 Feb 2002

Revision 1112 Jul 2003 - TobyCabot

Line: 1 to 1
 

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

Line: 53 to 53
  Allow users to export from your program. Provide a means to dump data from your internal format to some standard format.
Added:
>
>
http://slashdot.org/article.pl?sid=02/03/03/1821227&tid=126 - BBC digitizes old book, 15 years later the digital version is useless but the 1000-year-old book can still be read.
 -- TobyCabot - 22 Feb 2002

Revision 1012 Apr 2003 - TobyCabot

Line: 1 to 1
Changed:
<
<
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.
>
>

Introduction

In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

  What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.
Changed:
<
<
In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable (i.e. http://www.informationweek.com/story/IWK20010719S0003 ), or the format that the data is stored in becomes indecipherable.
>
>

Discussion

In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable (i.e. http://www.informationweek.com/story/IWK20010719S0003 ), or the format that the data is stored in becomes indecipherable. While both reasons produce the same result, they are different in terms of the actions that you need to take to prevent them.

  Media I've used:
  • cassette tape
Changed:
<
<
  • 8-inch floppy
  • MO cartridge disk
>
>
  • 8-inch floppy disk
  • Magneto-optical cartridge disk
 
  • 1/4 cartridge tape (QIC)
  • 5 1/4-inch floppy (CP/M, Apple ][, trs-80, C64, PC)
Changed:
<
<
  • 4MM DAT/DDS
>
>
  • 4MM DAT/DDS/DDS2
 
  • 3 1/2-inch floppy
  • CDROM
  • DVD-ROM
Changed:
<
<
Media I haven't, but know about:
>
>
Media I haven't used, but know about:
  • Hollerith punchcards
  • Paper tape
 
  • 9-track tape
Changed:
<
<
  • TZ/TK/DLT
>
>
  • TZ/TK/DLT tape cartridges
  • Bernoulli box drives
  • Zip drives (100MB, 250MB)
  • Jaz drives

Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed. This happens frequently because the media and the machines that read it are physical devices and therefore age and degrade over time. A failure in either the media or the reader is enough to render the data lost forever, but if one floppy fails you lose only the data on that floppy but you lose it permanently. If the floppy reader fails then you lose access to all of the data on all of that type of media, but you can potentially get it back by finding someone else that has that type of reader and borrowing it from her.

The contents of the media can become unreadable, even if the media is readable, if there is no software that can decipher the format that the data was stored in. You can think of storing data in terms of encrypting it using an encryption algorithm. Some encryption algorithms are stronger (i.e. harder to decipher) than others, and some algorithms are more widely understood than others. The software that you use to read and write the data are the encryption key. If you understand the "encryption algorithm" that you're using to write the data then you will not have to worry about deciphering it, but if you don't understand the algorithm then you are dependent on that software to read it for you. My Mom was dependent on software that could read data encrypted in a certain Microsoft algorithm that neither she nor I understood, and over time Microsoft themselves "lost the key" to that algorithm so her data was permanently locked up and Mom didn't have the key.

 
Changed:
<
<
Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed.
>
>
This is the most important reason (among many) why you should never store any data in a format that's not well documented. If the only person that understands the format is the person or company that produced it, they can decide at any time that they don't want to support it anymore and you have very little recourse. You could try to figure out how the format works by looking at your documents, but the process (known as "reverse-engineering") is time-consuming and boring and may be illegal in some cases. Note that the key distinction is not whether the format is proprietary or non-proprietary, it's whether the designer of the format has provided enough documentation of it that other people can read and write it. Some proprietary formats, for example Adobe Portable Document Format, are very well documented so many tools can read and write them.

The most important set of undocumented proprietary data formats are the Microsoft Office formats for documents, spreadsheets, presentations, etc. These formats are important because so much data is encoded in them every day, but they are not documented, and they change frequently. Many people spend many hours reverse-engineering them, but this effort is frustrated when a new version of the formats appears and the reverse-engineering process must start from scratch.

Notes

  Things that can be stored in standard formats:
Line: 36 to 53
  Allow users to export from your program. Provide a means to dump data from your internal format to some standard format.
Deleted:
<
<
Respect REST: be aware of your URL's - they're important! URL's should reference data - not tools to provide data. http://www.w3.org/Provider/Style/URI.html
 -- TobyCabot - 22 Feb 2002

Revision 907 Apr 2003 - TobyCabot

Line: 1 to 1
 In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

Changed:
<
<
In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable, or the format that the data is stored in becomes indecipherable.
>
>
In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable (i.e. http://www.informationweek.com/story/IWK20010719S0003 ), or the format that the data is stored in becomes indecipherable.
  Media I've used:
  • cassette tape
Line: 36 to 36
  Allow users to export from your program. Provide a means to dump data from your internal format to some standard format.
Changed:
<
<
Respect REST: be aware of your URL's - they're important! URL's should reference data - not tools to provide data.
>
>
Respect REST: be aware of your URL's - they're important! URL's should reference data - not tools to provide data. http://www.w3.org/Provider/Style/URI.html
 

-- TobyCabot - 22 Feb 2002

Revision 802 Jan 2003 - TobyCabot

Line: 1 to 1
 In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

Line: 30 to 30
  Text wins over proprietary formats (see Project Gutenberg).
Added:
>
>
Documented proprietary formats win over undocumented formats (i.e. RTF over .DOC).
 Backup vs. Archive: short-term vs long-term, bulk data vs document-oriented.

Allow users to export from your program. Provide a means to dump data from your internal format to some standard format.

Revision 713 Dec 2002 - TobyCabot

Line: 1 to 1
 In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.

What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

Line: 23 to 23
 Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed.

Things that can be stored in standard formats:

Changed:
<
<
  • Time/Date
>
>
 

Revision 611 Dec 2002 - TobyCabot

Line: 1 to 1
Changed:
<
<
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper.
>
>
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them.
  What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable, or the format that the data is stored in becomes indecipherable.

Changed:
<
<
Media:
>
>
Media I've used:
 
  • cassette tape
  • 8-inch floppy
  • MO cartridge disk
Changed:
<
<
  • 1/4 cartridge tape
>
>
  • 1/4 cartridge tape (QIC)
 
  • 5 1/4-inch floppy (CP/M, Apple ][, trs-80, C64, PC)
  • 4MM DAT/DDS
  • 3 1/2-inch floppy
  • CDROM
Added:
>
>
  • DVD-ROM

Media I haven't, but know about:

  • 9-track tape
  • TZ/TK/DLT
  Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed.
Line: 25 to 30
  Text wins over proprietary formats (see Project Gutenberg).
Added:
>
>
Backup vs. Archive: short-term vs long-term, bulk data vs document-oriented.

Allow users to export from your program. Provide a means to dump data from your internal format to some standard format.

 Respect REST: be aware of your URL's - they're important! URL's should reference data - not tools to provide data.

-- TobyCabot - 22 Feb 2002

Revision 523 May 2002 - TobyCabot

Line: 1 to 1
 In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper.

What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

Line: 17 to 17
  Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed.
Added:
>
>
Things that can be stored in standard formats:
 
Added:
>
>
Text wins over proprietary formats (see Project Gutenberg).

Respect REST: be aware of your URL's - they're important! URL's should reference data - not tools to provide data.

 -- TobyCabot - 22 Feb 2002

Revision 427 Mar 2002 - TobyCabot

Line: 1 to 1
Changed:
<
<
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she had printed all of the documents on paper since she didn't trust the computer.
>
>
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper.
  What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.
Added:
>
>
In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable, or the format that the data is stored in becomes indecipherable.

Media:

  • cassette tape
  • 8-inch floppy
  • MO cartridge disk
  • 1/4 cartridge tape
  • 5 1/4-inch floppy (CP/M, Apple ][, trs-80, C64, PC)
  • 4MM DAT/DDS
  • 3 1/2-inch floppy
  • CDROM

Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed.

 
  • Time/Date
  • Country Codes, Currencies
  • Addresses (vCard/iCard)

Revision 327 Mar 2002 - TobyCabot

Line: 1 to 1
Changed:
<
<
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed.
>
>
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she had printed all of the documents on paper since she didn't trust the computer.
  What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

Revision 214 Mar 2002 - TobyCabot

Line: 1 to 1
 In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed.

What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

  • Time/Date
  • Country Codes, Currencies
Changed:
<
<
  • Addresses

http://www.w3.org/TR/xmlschema-0/

>
>
  -- TobyCabot - 22 Feb 2002

Revision 122 Feb 2002 - TobyCabot

Line: 1 to 1
Added:
>
>
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed.

What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.

  • Time/Date
  • Country Codes, Currencies
  • Addresses

http://www.w3.org/TR/xmlschema-0/

-- TobyCabot - 22 Feb 2002

View topic | History: r26 < r25 < r24 < r23 | More topic actions...
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding The Caboteria? Send feedback