From the very beginning of civilization it is the quest for knowledge which has enabled human progress. It becomes very important for the Human beings as a race to conserve and preserve this vast amount of knowledge that human beings have evolved over the period of thousands of years. We have the concept of Libraries and museums to preserve various kinds of information. With the onset of internet and digitisation of the world, it is important to develop back up for website contents.
Web archiving is a process to serve in a same manner as traditional libraries did. The only difference is that they preserve and collect the content of world wide Web in an archival format. This information in archive format can be reused and accessed easily.
Web is Big How do Archivist Select the Data
Web archivist generally make use of an automated process to collect websites. The specially designed software termed as ‘crawlers’ crawls across the internet and copies and saving the data. ‘Crawlers’ download code, images, documents and other files faithfully. This process is known as Harvesting. The archived website preserved in this manner are typically snapshots of the data in time. Archived website can be viewed on internet exactly in the same manner as the active website. An organisation known as ‘Internet Archive’ is in a quest to archive the entire World Wide Web.
Let us Delve Deeper in to the Question why Archive Websites?
Websites serve the purpose of providing identity to the company. Sometimes a website becomes a unique source of information. Website archiving is related to legal matters as well. Archiving is about protecting organisation’s contents against false claims. It safeguards intellectual property rights of an organisation.
Three Types of Archiving
1. Client Side: This is the most common type of web archiving. It gives an advantage of doing it remotely.
2. Transaction based: This requires cooperation with the server owners.
3. Server Side: This too requires collaboration from the server owners.
What to do to Make Archiving Easy
- We should basically make it easy for the crawler to identify your website faster. It is advisable to keep entire content under one root URL. User friendly URL also helps in search engine optimisation and usability.
- Website should adhere to standards given by WC3.
- Using XML site maps is best practice as it helps findability of the website by listing and linking all the content of the website.
- Sometimes website use options like log in, tick box or search box. These options cannot be read by machines. Hence ‘crawler won’t be able to detect the websites. Using sitemap or A-Z list is better option.
Collaborations in The Field of Web Archiving
IWAW (International Web Archiving Workshop) started functioning in the year 2001. IWAW provides platform to share and discuss ideas about web archiving. International collaboration in the field of web archiving was given a major thrust when a new organisation International Internet Preservation Consortium was founded in the year 2003.