About the Internet Archive
Why the Archive is Building an 'Internet Library'
Libraries exist to preserve society's cultural artifacts and to provide access to them. If libraries are to continue to foster education and scholarship in this era of digital technology, it's essential for them to extend those functions into the digital world.
Many early movies were recycled to recover the silver in the film. The Library of Alexandria - an ancient center of learning containing a copy of every book in the world - was eventually burned to the ground. Even now, at the turn of the 21st century, no comprehensive archives of television or radio programs exist.
But without cultural artifacts, civilization has no memory and no mechanism to learn from its successes and failures. And paradoxically, with the explosion of the Internet, we live in what Danny Hillis has referred to as our "digital dark age."
The Internet Archive is working to prevent the Internet - a new medium with major historical significance - and other "born-digital" materials from disappearing into the past. Collaborating with institutions including the Library of Congress and the Smithsonian, we are working to preserve a record for generations to come.
Open and free access to literature and other writings has long been considered essential to education and to the maintenance of an open society. Public and philanthropic enterprises have supported it through the ages.
The Internet Archive is opening its collections to researchers, historians, and scholars. The Archive has no vested interest in the discoveries of the users of its collections, nor is it a grant-making organization.
At present, the size of our Web collection is such that using it requires programming skills. However, we are hopeful about the development of tools and methods that will give the general public easy and meaningful access to our collective history. In addition to developing our own collections, we are working to promote the formation of other Internet libraries in the United States and elsewhere.
Find outInternet libraries raise many issues in a range of areas, including archiving technology, copyright, privacy and free speech, trademark, trade secrets, import/export issues, stolen property, pornography, the question of who will have access to the libraries, and more.
Below are links to projects, resources, and institutions related to Internet libraries.
Internet Libraries and Librarianship
Archiving Technology
Internet Mapping
Internet Statistics
Copyright
Privacy and Free Speech
Internet Libraries and Librarianship
Alexa Internet has catalogued Web sites and provides this information in a free service.
www.alexa.comThe American Library Association is a major trade association of American libraries.
www.ala.orgThe Australian National Library collects material including organizational Web sites.
pandora.nla.gov.au/documents.htmlThe Council on Library and Information Resources works to ensure the well-being of the scholarly communication system.
www.clir.org
See its publication Why Digitize? at
www.clir.org/pubs/reports/pub80-smith/pub80.htmlThe Digital Library Forum (D-Lib) publishes an online magazine and other resources for building digital libraries.
www.dlib.orgAttorney I. Trotter Hardy explains copyright law and examines its implications for digital materials in his paper Internet Archives and Copyright.
copyright_TH.phpThe Internet Public Library site has many links to online resources for the general public.
www.ipl.orgBrewster Kahle is a founder of WAIS Inc. and Alexa Internet and chairman of the board of the Internet Archive. See his paper The Ethics of Digital Librarianship at
ethics_BK.phpMichael Lesk of the National Science Foundation has written extensively on digital archiving and digital libraries.
www.purl.net/NET/leskThe Library of Congress is the national library of the United States.
www.loc.govThe Museum Digital Library plans to help digitize collections and provide access to them.
www.digitalmuseums.orgThe National Archives and Records Administration oversees the management of all US federal records. It also archives federal Web sites including the Clinton White House site.
www.nara.govThe National Science Foundation Digital Library Program has funded academic research on digital libraries.
www.nsf.gov/home/crssprgm/dli/start.htmNational Technical Information Service (NTIS), U.S. Department of Commerce, Technology Administration. NTIS is an archive and distributor of scientific, technical, engineering and business related information developed by and for the federal government.
www.ntis.govNetwork Wizards has been tracking Internet growth for many years.
www.nw.comProject Gutenberg is making ASCII versions of classic literature openly available. www.gutenberg.org
The Radio and Television Archive has many links to related resources.
www.rtvf.unt.edu/links/histsites.htmRevival of the Library of Alexandria is a project to revive the ancient library in Egypt.
www.bibalex.orgThe Society of American Archivists is a professional association focused on ensuring the identification, preservation, and use of records of historical value.
www.archivists.orgThe Royal Institute of Technology Library in Sweden is creating a system of quality-assessed information resources on the Internet for academic use.
www.lib.kth.se/main/engThe United States Government Printing Office produces and distributes information published by the US government.
www.access.gpo.govThe University of Virginia is building a catalog of digital library activities.
http://www.lib.virginia.edu/digital/
The Association for Computing Machinery (ACM) computing and public policy page includes papers and news on pending legislation on issues including universal access, copyright and intellectual property, free speech and the Internet, and privacy.
www.acm.org/servingThe Carnegie Mellon University Informedia Digital Video Library Project is studying how multimedia digital libraries can be established and used.
www.informedia.cs.cmu.eduThe Intermemory Project aims to develop highly survivable and available storage systems.
www.intermemory.orgThe National Film Preservation Board, established by the National Film Preservation Act of 1988, works with the Library of Congress to study and implement plans for film and television preservation. The site's research page includes links to the board's 1993 film preservation study, a 1994 film preservation plan, and a 1997 television and video study. All the documents warn of the dire state of film and television preservation in the United States.
lcweb.loc.gov/film/filmpres.htmlThe National Institute of Standards and Technology (NIST) posts IEC International Standard names and symbols for prefixes for binary multiples for use in data processing and data transmission.
www.physics.nist.gov/cuu/Units/binary.htmlThe Text Retrieval Conference (TREC) encourages research in information retrieval from large text collections.
trec.nist.gov
An Atlas of Cyberspaces has maps and dynamic tools for visualizing Web browsing.
www.cybergeography.com/atlas/surf.htmlThe Internet Mapping Project is a long-term project by a scientist at Bell Labs to collect routing data on the Internet.
www.cs.bell-labs.com/who/ches/mapThe Matrix Information Directory Service has good maps and visualizations of the networked world.
www.mids.orgPeacock Maps has maps of Internet connectivity.
www.peacockmaps.com
WebReference has an Internet statistics page (publisher: Internet.com).
webreference.com/internet/statistics.html
The Association for Computing Machinery (ACM) copyright information page includes text of pertinent laws and pending legislation.
www.acm.org/usacm/copyrightTom W. Bell teaches intellectual property and Internet law at Chapman University School of Law.
www.tomwbell.com
His site includes a graph showing the trend of the maximum US copyright term at www.tomwbell.com/writings/(C)_Term.htmlCornell University posts the text of copyright law at
www4.law.cornell.edu/uscode/unframed/17/107.html
www4.law.cornell.edu/uscode/unframed/17/108.htmlThe Digital Future Coalition is a nonprofit working on the issues of copyright in the digital age.
www.dfc.orgThe National Academy Press is the publishing arm of the national academies.
"The Digital Dilemma: Intellectual Property in the Information Age"
http://www.nap.edu/html/digital_dilemma/
"LC21: A Digital Strategy for the Library of Congress"
www.nap.edu/books/0309071445/htmlPamela Samuelson is a professor in the School of Information Management and Systems at UC Berkeley.
info.berkeley.edu/~pamTitle 17 of US copyright code
www.loc.gov/copyright/title17/US Government Copyright Office
www.loc.gov/copyright
The Association for Computing Machinery (ACM) free-speech information page includes the text of pertinent laws and pending legislation.
www.acm.org/usacm/speechThe Association for Computing Machinery (ACM) privacy information page includes the text of congressional testimony and links to other resources.
www.acm.org/usacm/privacyThe Benton Foundation Communications Policy and Practice Program has the goal of infusing the emerging communications environment with public-interest values.
www.benton.org/cpphome.htmlThe Center for Democracy and Technology works to promote democratic values and constitutional liberties in the digital age.
www.cdt.orgThe Computers Freedom and Privacy Conference has a site containing information on each annual conference held since 1991.
www.cfp.orgThe Electronic Frontier Foundation works to protect fundamental civil liberties, including privacy and freedom of expression in the arena of computers and the Internet.
www.eff.orgThe Electronic Privacy Information Center, a project of the Fund for Constitutional Government, is a public-interest research center whose goal is to focus public attention on emerging civil liberties issues and to protect privacy, the First Amendment, and constitutional values.
www.epic.orgThe Free Expression Policy Project is a think tank on artistic and intellectual freedom at NYU's Brennan Center for Justice. Through policy research and advocacy, they explore freedom of expression issues including censorship, copyright law, media localism, and corporate media reform.
www.fepproject.orgThe Internet Free Expression Alliance is an information and advocacy organization focused on free speech as it relates to the Internet.
www.ifea.netThe Internet Privacy Coalition aims to protect privacy on the Internet by promoting the widespread availability of strong encryption and the relaxation of export controls on cryptography.
www.privacy.org/ipcThe Privacy Page includes news, alerts, and links to privacy-related resources. Related organizations include the Electronic Privacy Information Center, the Internet Privacy Coalition, and Privacy International.
www.privacy.orgPrivacy International is a London-based human rights group formed as a watchdog on surveillance by governments and corporations.
www.privacy.org/pi
Please suggest other pages that may be appropriate here.
Storage and Preservation
The Archive has two practical considerations in dealing with digital collections:How to store massive amounts of data
How to preserve the data for posterity
Storing the Archive's collections involves parsing, indexing, and physically encoding the data. With the Internet collections growing at exponential rates, this task poses an ongoing challenge.
Our hardware consists of PCs with clusters of IDE hard drives. Data is stored on DLT tape and hard drives in various appropriate formats, depending on the collection. Web data is received and stored in archive format of 100-megabyte ARC files made up of many individual files. Alexa Internet (currently the source of all crawls in our collections) is proposing ARC as a standard for archiving Internet objects. See Alexa for the format specification.
Preservation is the ongoing task of permanently protecting stored resources from damage or destruction. The main issues are guarding against the consequences of accidents and data degradation and maintaining the accessibility of data as formats become obsolete.
Accidents: Any medium or site used to store data is potentially vulnerable to accidents and natural disasters. Maintaining copies of the Archive�s collections at multiple sites can help alleviate this risk. Part of the collection is already handled this way, and we are proceeding as quickly as possible to do the same with the rest.
Migration: Over time, storage media can degrade to a point where the data becomes permanently irretrievable. Although DLT tape is rated to last 30 years, the industry rule of thumb is to migrate data every 10 years. We no longer use tapes for storage, however. Please take a look at our page on our Petabox system for more information on our storage systems.
Data formats: As advances are made in software applications, many data formats become obsolete. We will be collecting software and emulators that will aid future researchers, historians, and scholars in their research.
Find out
How to get free access to the Archive's Internet collections
About our announcement and discussion lists on Internet libraries and movie archives
0 Comments:
Post a Comment
<< Home