1. What data is returned from a web crawl?
80legs can return pretty much any data found on a crawled URL. It's up to you, however, to customize an 80legs web crawl to tell it what data to collect. You do this by specifying an 80app to use with your web crawl. 80apps determine what data is returned from each URL you crawl.
We regularly update our public repository with new 80apps that return different sorts of data.
You can add these 80apps or your own through the web portal or API.
2. What format is the data?
Data from 80legs is returned in JSON format by default, but CSV is available in certain cases. Here's an example of the JSON data:
Note that each URL crawled shows the URL and the data returned from the URL. The value of the result will be a string, regardless of the data format returned from the crawled URL.
In some cases, your crawl may encounter errors when trying to access a URL. If this happens, you'll see something like this in your crawl:
If you're unfamiliar with JSON data, we recommend checking out these websites:
- JSON Documentation
- What is JSON and how to use it
- Our source code for converting JSON to CSV
- Top 5 Free JSON Viewers
- An online tool for converting JSON to CSV
You can also open any JSON file in a basic text editor like Notepad, although we recommend EditPad for larger files.
If you're using one of our default 80apps, a CSV conversion option will be available through the web portal. We only provide CSV conversion for default 80apps because we know what columns are needed for the data returned from these crawls.
3. When are results posted?
80legs will post results for your crawl under two conditions:
- Your crawl has completed or been canceled.
- At least 100 MB of new data have been collected for your crawl.
4. How long are results available?
Results from your crawls are available for 7 days after they are created. Result files are posted while the crawl is running. You should download crawl results as soon as they become available.