The Internet Archive has just announced that they have received a grant to enable a significant update to its popular Wayback Machine. The plan is to revamp the website, including a complete rewrite of the code, as well as adding a search functionality that will allow websites to be found based on keywords.
The Wayback Machine was launched on the 24 January 1996 by Brewster Kahle and Bruce Gilliat, who developed the software to crawl and download all accessible web pages. By 2015, Wayback Machine had archived over 452 billion pages, and the index is growing by over 20 terabytes of data each week. You can see a screenshot of the Wayback Machine in action below (it actually shows a version or archive.org 2 days after the site went live):
The redeveloped site is scheduled to launch in 2017, thanks to the grant from the Laura and John Arnold Foundation (LJAF). The goal of the project is to preserve the world’s cultural heritage and deal with the problem that the problem that the average life of a web page is just one hundred days before it is either altered or deleted (estimate by the Internet Archive). LJAF Vice President of Venture Development, Kelli Rhee commented, “The Internet Archive is helping to preserve the world’s digital history in a transformational way. Taking the Wayback Machine to the next level will make the entire Web more reliable, transparent and accessible for everyone.”
The Wayback Machine Project goals:
- Highlighting the provenance of pages found in the Wayback Machine – This will increase the visibility of those partners that select websites \ web pages for collection.
- Rewriting the Wayback Machine code – By rewriting the website code from scratch, it can be built more efficiently with modern standards. This will have the benefit of improving performance, reliability and functionality.
- Optimizing the scope and quality of pages we crawl – By crawling more efficiently not only will it enable more pages to be captured but it will improve what is captured. Currently, they are storing about one billion pages per week.
- Improving the playback of media-rich and interactive websites – Websites are rapidly changing, and now using many different media formats. The project aims to support many more formats.
- Updating the user interface – Improving the user interface will make it easier for users to discover archived websites.
- Finding sites based on keywords – This is one of the most exciting features. Being able to search for old sites \ web pages based on keywords will be a powerful addition.
- Partnering with other services to repair broken links by pointing to the Wayback Machine – By Working with firms such as the Wikimedia Foundation (i.e. Wikipedia) they will be able to replace broken links with archived ones.
If you are a developer, and the prospect of helping out with the re-development is exciting, then you may be interested to know that they are currently looking to recruit a senior product manager to manage it (Update: They removed the job advert).
In time, we can see the Wayback Machine being an important part of the web. Indeed, there have been many court cases that sought to rely on records of past web pages. It is not without its issues, though. The Legal status in Europe means that it is probably violating copyright laws. As a result, they have a clear exclusion policy to follow any restrictions placed in a websites robots.txt file. This also works retrospectively, so adding an exclusion in a robots.txt file will remove any previous copies as well.
Regardless of the various legalities surrounding it, we know that it is an excellent resource with great intentions. As such we are very much in support of the site.