Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end Microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. | ||||||||
Line: 80 to 80 | ||||||||
http://arstechnica.com/science/news/2010/11/preserving-science-how-data-gets-lost.ars - an article about this topic in the context of scientific research http://blog.longnow.org/02014/02/24/iceisee-3-to-return-to-an-earth-no-longer-capable-of-speaking-to-it/ - a sad tale of a perfectly functional satellite having to be mothballed because we can no longer communicate with it. | ||||||||
Added: | ||||||||
> > | https://xkcd.com/1909/ - XKCD has a funny perspective on the subject |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Deleted: | ||||||||
< < | ||||||||
Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end Microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. | ||||||||
Line: 79 to 78 | ||||||||
http://lwn.net/Articles/240528/ - a link to an article on this topic by Jeremy Allison of the Samba team. Some of the comments are interesting, too. http://arstechnica.com/science/news/2010/11/preserving-science-how-data-gets-lost.ars - an article about this topic in the context of scientific research | ||||||||
Added: | ||||||||
> > | http://blog.longnow.org/02014/02/24/iceisee-3-to-return-to-an-earth-no-longer-capable-of-speaking-to-it/ - a sad tale of a perfectly functional satellite having to be mothballed because we can no longer communicate with it. |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Deleted: | ||||||||
< < | Introduction | |||||||
Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end Microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. | ||||||||
Changed: | ||||||||
< < | DiscussionIn my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable (i.e. http://www.informationweek.com/story/IWK20010719S0003 ), or the format that the data is stored in becomes indecipherable. While both reasons produce the same result, they are different in terms of the actions that you need to take to prevent them. | |||||||
> > | In my experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable (i.e. http://www.informationweek.com/story/IWK20010719S0003 ), or the format that the data is stored in becomes indecipherable. While both reasons produce the same result, they are different in terms of the actions that you need to take to prevent them. | |||||||
Media I've used:
| ||||||||
Line: 21 to 18 | ||||||||
| ||||||||
Added: | ||||||||
> > |
| |||||||
Media I haven't used, but know about:
| ||||||||
Line: 30 to 29 | ||||||||
| ||||||||
Changed: | ||||||||
< < | Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed. This happens frequently because the media and the machines that read it are physical devices and therefore age and degrade over time. A failure in either the media or the reader is enough to render the data lost forever, but if one floppy fails you lose only the data on that floppy but you lose it permanently. If the floppy reader fails then you lose access to all of the data on all of that type of media, but you can potentially get it back by finding someone else that has that type of reader and borrowing it from her. | |||||||
> > | Media can become unreadable if the media itself fails (magnets, scratches, click of death) or if the reader breaks and can't be fixed. The media and the machines that read it are physical devices and age and degrade over time. A failure in either the media or the reader is enough to render the data lost forever: if one floppy fails you lose only the data on that floppy but if your floppy reader fails then you lose access to all of the data on all of that type of media. You can potentially get it back by finding someone else that has that type of reader and borrowing it from her, but you don't want to count on that. | |||||||
Changed: | ||||||||
< < | The contents of the media can become unreadable, even if the media is readable, if there is no software that can decipher the format that the data was stored in. You can think of storing data in terms of encrypting it using an encryption algorithm. Some encryption algorithms are stronger (i.e. harder to decipher) than others, and some algorithms are more widely understood than others. The software that you use to read and write the data are the encryption key. If you understand the "encryption algorithm" that you're using to write the data then you will not have to worry about deciphering it, but if you don't understand the algorithm then you are dependent on that software to read it for you. My Mom was dependent on software that could read data encrypted in a certain Microsoft algorithm that neither she nor I understood, and over time Microsoft themselves "lost the key" to that algorithm so her data was permanently locked up and Mom didn't have the key. | |||||||
> > | Data can become unreadable, even if the media is readable, if there is no software that can decipher the format that the data was stored in. Think of storing data in terms of encrypting it using an encryption algorithm. Some encryption algorithms are stronger (i.e. harder to decipher) than others, and some algorithms are more widely understood than others. The software that you use to read and write the data are the encryption key. If you understand the "encryption algorithm" that you're using to write the data then you will not have to worry about deciphering it, but if you don't understand the algorithm then you are dependent on that software to read it for you. My Mom was dependent on software that could read data encrypted in a certain Microsoft algorithm that neither she nor I understood, and over time Microsoft themselves "lost the key" to that algorithm so her data was permanently locked up and Mom didn't have the key. | |||||||
This is the most important reason (among many) why you should never store any data in a format that's not well documented. If the only person that understands the format is the person or company that produced it, they can decide at any time that they don't want to support it anymore and you have very little recourse. You could try to figure out how the format works by looking at your documents, but the process (known as "reverse-engineering") is time-consuming and boring and may be illegal in some cases. Note that the key distinction is not whether the format is proprietary or non-proprietary, it's whether the designer of the format has provided enough documentation of it that other people can read and write it. Some proprietary formats, for example Adobe Portable Document Format, are very well documented so many tools can read and write them. |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | ||||||||
Line: 79 to 79 | ||||||||
http://lwn.net/Articles/240528/ - a link to an article on this topic by Jeremy Allison of the Samba team. Some of the comments are interesting, too. | ||||||||
Changed: | ||||||||
< < | -- TobyCabot - 22 Feb 2002 - 05 Jul 2007 | |||||||
> > | http://arstechnica.com/science/news/2010/11/preserving-science-how-data-gets-lost.ars - an article about this topic in the context of scientific research |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | ||||||||
Line: 77 to 77 | ||||||||
http://photoshopnews.com/?p=226 - this issue affects even cameras. Here's a story about an expensive camera that uses a proprietary format that can only be read by that vendor's software. | ||||||||
Added: | ||||||||
> > | http://lwn.net/Articles/240528/ - a link to an article on this topic by Jeremy Allison of the Samba team. Some of the comments are interesting, too. | |||||||
Changed: | ||||||||
< < | -- TobyCabot - 22 Feb 2002 - 13 Oct 2004 | |||||||
> > | -- TobyCabot - 22 Feb 2002 - 05 Jul 2007 | |||||||
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | ||||||||
Line: 41 to 41 | ||||||||
Notes | ||||||||
Added: | ||||||||
> > | In the summer of 2005 the Commonwealth of Massachusetts decided to use the OpenDocument document format instead of MS's proprietary formats. One of their keys issues was the ability to read documents for a very long time. Here's an analysis of that decision: http://www.dwheeler.com/essays/why-opendocument-won.html | |||||||
Dublin Core Metadata Initiative (http://dublincore.org/) offers standards for encoding many different types of data, for example http://dublincore.org/documents/dcmi-terms/. Things that can be stored in standard formats: |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | ||||||||
Line: 73 to 73 | ||||||||
http://www.theregister.co.uk/2005/02/21/forgetting_digital_memories/ - Digital memories: we can forget them for you wholesale! | ||||||||
Added: | ||||||||
> > | http://photoshopnews.com/?p=226 - this issue affects even cameras. Here's a story about an expensive camera that uses a proprietary format that can only be read by that vendor's software. | |||||||
-- TobyCabot - 22 Feb 2002 - 13 Oct 2004 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | ||||||||
Line: 41 to 41 | ||||||||
Notes | ||||||||
Added: | ||||||||
> > | Dublin Core Metadata Initiative (http://dublincore.org/) offers standards for encoding many different types of data, for example http://dublincore.org/documents/dcmi-terms/. | |||||||
Things that can be stored in standard formats: | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Changed: | ||||||||
< < | Text wins over proprietary formats (see Project Gutenberg). | |||||||
> > | Text wins over proprietary formats (see IETF, Project Gutenberg). Documented proprietary formats win over undocumented formats (e.g. RTF over DOC). | |||||||
Changed: | ||||||||
< < | Documented proprietary formats win over undocumented formats (i.e. RTF over .DOC). | |||||||
> > | Lossless wins over lossy (e.g. FLAC over MP3). | |||||||
Backup vs. Archive: short-term vs long-term, bulk data vs document-oriented. |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | ||||||||
Line: 16 to 16 | ||||||||
| ||||||||
Added: | ||||||||
> > |
| |||||||
| ||||||||
Added: | ||||||||
> > |
| |||||||
Media I haven't used, but know about:
| ||||||||
Deleted: | ||||||||
< < |
| |||||||
| ||||||||
Line: 66 to 67 | ||||||||
http://www.ietf.org/internet-drafts/draft-ietf-geopriv-dhcp-civil-04.txt - an IETF draft for "civic location," also has some good references | ||||||||
Added: | ||||||||
> > | http://www.theregister.co.uk/2005/02/21/forgetting_digital_memories/ - Digital memories: we can forget them for you wholesale! | |||||||
-- TobyCabot - 22 Feb 2002 - 13 Oct 2004 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | ||||||||
Line: 64 to 64 | ||||||||
http://www.itl.nist.gov/div895/carefordisc/index.html - The US feds help keep your data safe. | ||||||||
Changed: | ||||||||
< < | -- TobyCabot - 22 Feb 2002 - 25 Jan 2004 | |||||||
> > | http://www.ietf.org/internet-drafts/draft-ietf-geopriv-dhcp-civil-04.txt - an IETF draft for "civic location," also has some good references
-- TobyCabot - 22 Feb 2002 - 13 Oct 2004 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Introduction | ||||||||
Changed: | ||||||||
< < | In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | |||||||
> > | In the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetery down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | |||||||
What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. | ||||||||
Line: 35 to 35 | ||||||||
This is the most important reason (among many) why you should never store any data in a format that's not well documented. If the only person that understands the format is the person or company that produced it, they can decide at any time that they don't want to support it anymore and you have very little recourse. You could try to figure out how the format works by looking at your documents, but the process (known as "reverse-engineering") is time-consuming and boring and may be illegal in some cases. Note that the key distinction is not whether the format is proprietary or non-proprietary, it's whether the designer of the format has provided enough documentation of it that other people can read and write it. Some proprietary formats, for example Adobe Portable Document Format, are very well documented so many tools can read and write them. | ||||||||
Changed: | ||||||||
< < | The most important set of undocumented proprietary data formats are the Microsoft Office formats for documents, spreadsheets, presentations, etc. These formats are important because so much data is encoded in them every day, but they are not documented, and they change frequently. Many people spend many hours reverse-engineering them, but this effort is frustrated when a new version of the formats appears and the reverse-engineering process must start from scratch. | |||||||
> > | The most important set of undocumented proprietary data formats are the Microsoft Office formats for documents, spreadsheets, presentations, etc. These formats are important because so much data is encoded in them every day, but they are not documented, and they change frequently. Many people spend many man-years reverse-engineering them, but this effort is frustrated when a new version of the formats appears and the reverse-engineering process must start from scratch. And no, the XML-based Office 2003 formats are not any less proprietary than the previous binary ones. For one thing, they're still undocumented, for another, they're probably patented so even if you were capable of figuring out how to read them it would be against the law for you to do so. | |||||||
Notes | ||||||||
Line: 61 to 62 | ||||||||
wrjpgcom is a tool to write data to the comment field of a jpeg image file. Need to find the source. | ||||||||
Changed: | ||||||||
< < | -- TobyCabot - 22 Feb 2002 | |||||||
> > | http://www.itl.nist.gov/div895/carefordisc/index.html - The US feds help keep your data safe.
-- TobyCabot - 22 Feb 2002 - 25 Jan 2004 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | ||||||||
Line: 59 to 59 | ||||||||
http://www.ecommercetimes.com/perl/story/31436.html | ||||||||
Added: | ||||||||
> > | wrjpgcom is a tool to write data to the comment field of a jpeg image file. Need to find the source. | |||||||
-- TobyCabot - 22 Feb 2002 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | ||||||||
Line: 57 to 57 | ||||||||
http://computerworld.co.nz/webhome.nsf/NL/A7D9D35CE6CC6DE3CC256D5F00728810 | ||||||||
Added: | ||||||||
> > | http://www.ecommercetimes.com/perl/story/31436.html | |||||||
-- TobyCabot - 22 Feb 2002 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | ||||||||
Line: 14 to 14 | ||||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Line: 55 to 55 | ||||||||
http://slashdot.org/article.pl?sid=02/03/03/1821227&tid=126 - BBC digitizes old book, 15 years later the digital version is useless but the 1000-year-old book can still be read. | ||||||||
Added: | ||||||||
> > | http://computerworld.co.nz/webhome.nsf/NL/A7D9D35CE6CC6DE3CC256D5F00728810 | |||||||
-- TobyCabot - 22 Feb 2002 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | ||||||||
Line: 53 to 53 | ||||||||
Allow users to export from your program. Provide a means to dump data from your internal format to some standard format. | ||||||||
Added: | ||||||||
> > | http://slashdot.org/article.pl?sid=02/03/03/1821227&tid=126 - BBC digitizes old book, 15 years later the digital version is useless but the 1000-year-old book can still be read. | |||||||
-- TobyCabot - 22 Feb 2002 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Changed: | ||||||||
< < | In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | |||||||
> > | IntroductionIn the physical world, entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | |||||||
What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. | ||||||||
Changed: | ||||||||
< < | In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable (i.e. http://www.informationweek.com/story/IWK20010719S0003 ), or the format that the data is stored in becomes indecipherable. | |||||||
> > | DiscussionIn my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable (i.e. http://www.informationweek.com/story/IWK20010719S0003 ), or the format that the data is stored in becomes indecipherable. While both reasons produce the same result, they are different in terms of the actions that you need to take to prevent them. | |||||||
Media I've used:
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Changed: | ||||||||
< < | Media I haven't, but know about: | |||||||
> > | Media I haven't used, but know about:
| |||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Changed: | ||||||||
< < | Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed. | |||||||
> > | This is the most important reason (among many) why you should never store any data in a format that's not well documented. If the only person that understands the format is the person or company that produced it, they can decide at any time that they don't want to support it anymore and you have very little recourse. You could try to figure out how the format works by looking at your documents, but the process (known as "reverse-engineering") is time-consuming and boring and may be illegal in some cases. Note that the key distinction is not whether the format is proprietary or non-proprietary, it's whether the designer of the format has provided enough documentation of it that other people can read and write it. Some proprietary formats, for example Adobe Portable Document Format, are very well documented so many tools can read and write them.
The most important set of undocumented proprietary data formats are the Microsoft Office formats for documents, spreadsheets, presentations, etc. These formats are important because so much data is encoded in them every day, but they are not documented, and they change frequently. Many people spend many hours reverse-engineering them, but this effort is frustrated when a new version of the formats appears and the reverse-engineering process must start from scratch.
Notes | |||||||
Things that can be stored in standard formats:
| ||||||||
Line: 36 to 53 | ||||||||
Allow users to export from your program. Provide a means to dump data from your internal format to some standard format. | ||||||||
Deleted: | ||||||||
< < | Respect REST: be aware of your URL's - they're important! URL's should reference data - not tools to provide data. http://www.w3.org/Provider/Style/URI.html | |||||||
-- TobyCabot - 22 Feb 2002 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. | ||||||||
Changed: | ||||||||
< < | In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable, or the format that the data is stored in becomes indecipherable. | |||||||
> > | In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable (i.e. http://www.informationweek.com/story/IWK20010719S0003 ), or the format that the data is stored in becomes indecipherable. | |||||||
Media I've used:
| ||||||||
Line: 36 to 36 | ||||||||
Allow users to export from your program. Provide a means to dump data from your internal format to some standard format. | ||||||||
Changed: | ||||||||
< < | Respect REST: be aware of your URL's - they're important! URL's should reference data - not tools to provide data. | |||||||
> > | Respect REST: be aware of your URL's - they're important! URL's should reference data - not tools to provide data. http://www.w3.org/Provider/Style/URI.html | |||||||
-- TobyCabot - 22 Feb 2002 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. | ||||||||
Line: 30 to 30 | ||||||||
Text wins over proprietary formats (see Project Gutenberg). | ||||||||
Added: | ||||||||
> > | Documented proprietary formats win over undocumented formats (i.e. RTF over .DOC). | |||||||
Backup vs. Archive: short-term vs long-term, bulk data vs document-oriented. Allow users to export from your program. Provide a means to dump data from your internal format to some standard format. |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. | ||||||||
Line: 23 to 23 | ||||||||
Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed. Things that can be stored in standard formats: | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Changed: | ||||||||
< < | In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. | |||||||
> > | In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. If she had used the computer the way that the manufacturers wanted her to she would have either lost the documents or I would have spent a lot of time scraping the data out of them. | |||||||
What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable, or the format that the data is stored in becomes indecipherable. | ||||||||
Changed: | ||||||||
< < | Media: | |||||||
> > | Media I've used: | |||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Added: | ||||||||
> > |
| |||||||
Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed. | ||||||||
Line: 25 to 30 | ||||||||
Text wins over proprietary formats (see Project Gutenberg). | ||||||||
Added: | ||||||||
> > | Backup vs. Archive: short-term vs long-term, bulk data vs document-oriented. Allow users to export from your program. Provide a means to dump data from your internal format to some standard format. | |||||||
Respect REST: be aware of your URL's - they're important! URL's should reference data - not tools to provide data.
-- TobyCabot - 22 Feb 2002 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. | ||||||||
Line: 17 to 17 | ||||||||
Media can become unreadable if the media itself fails (magnets, scratches) or if the reader breaks and can't be fixed. | ||||||||
Added: | ||||||||
> > | Things that can be stored in standard formats: | |||||||
| ||||||||
Added: | ||||||||
> > | Text wins over proprietary formats (see Project Gutenberg). Respect REST: be aware of your URL's - they're important! URL's should reference data - not tools to provide data. | |||||||
-- TobyCabot - 22 Feb 2002 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Changed: | ||||||||
< < | In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she had printed all of the documents on paper since she didn't trust the computer. | |||||||
> > | In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she lost nothing since she had printed all of the documents on paper. | |||||||
What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. | ||||||||
Added: | ||||||||
> > | In my (admittedly brief) experience with computers, data becomes inaccessible for two reasons: the physical media that it's stored on becomes unreadable, or the format that the data is stored in becomes indecipherable.
Media:
| |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Changed: | ||||||||
< < | In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. | |||||||
> > | In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed. She was lucky to be computer-naive - she had printed all of the documents on paper since she didn't trust the computer. | |||||||
What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it. |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed.
What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
-- TobyCabot - 22 Feb 2002 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Added: | ||||||||
> > | In the physical world (3space, meat-space), entropy is the unavoidable consequence of the 2nd law of thermodynamics. In theory the 2nd law doesn't hold in cyberspace, but in practice it does. Information that's stored in computers "rusts" at least as fast as information stored in real-world media. I'm reminded of this every time I walk through the cemetary down the street from my house: many headstones manufactured in the 17th century still convey the information that they did when they were new. On the other hand, when my Mom "upgraded" to Windows '95 a few years back she found that all of the documents she had written in her low-end microsoft word processor were completely illegible on the new version of the high-end office suite, even though only 5 years had passed.
What can you do? To start with, be aware of information entropy, and decide whether or not you care about it. If you don't care then you're wasting your time reading this document; if you do then I hope that you'll learn something from it.
|