PHP Classes

Sorcerer: Scrape Web page content using regular expressions

Recommend this page to a friend!
  Info   Documentation   View files Files   Install with Composer Install with Composer   Download Download   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not enough user ratingsTotal: 141 All time: 9,196 This week: 524Up
Version License PHP version Categories
sorcerer 1.0.0MIT/X Consortium ...5PHP 5, Web services
Description 

Author

This class can scrape Web page content using regular expressions,

It takes a given page URL and retrieves its contents.

The class can use a given list of regular expressions and extract the page content matches to a given file.

Picture of Gavin Gordon Markowski
  Performance   Level  
Name: Gavin Gordon Markowski <contact>
Classes: 13 packages by
Country: Canada Canada
Age: 36
All time rank: 178541 in Canada Canada
Week rank: 193 Up6 in Canada Canada Up
Innovation award
Innovation award
Nominee: 5x

Documentation

Sorcerer

Packagist Version Github Release Usage License

Description

An easy-to-use PHP class for scraping webpages' source code.

Usage

Installation

	$ composer require gavinggordon/sorcerer

Examples

Insantiation

	include( 'vendor/autoload.php' );

	use GGG\Http\Data\Collection\Sorcerer as Sorcerer;
	
	$scraper = new Sorcerer();

Configuration

	$url = 'http://www.testurl.com/index.php';
	
	$regexes = [
		'/\<a\s?[^\>]+?\>(.+)\<\/a\>/i',
		'/\<img\s?([^\>]+?)[\s\/]*?\>/i'
	];
	
	$savefile = __DIR__ . './testurl-scrapedata.txt';
	
	$scraper->configure( $url, $regexes, $savefile );

Run

If no filepath was set for "$savefile",...

	$data = $scraper->scrape();
	
	print_r( $data );

...the scraped data will be returned.

If a filepath was set for "$savefile",...

	$scraper->scrape();

...the scraped data will be saved to the file which you specified.

Issues

If you have any issues at all, please post your findings in the issues page at https://github.com/gavinggordon/sorcerer/issues.

License

This package utilizes the MIT License.


  Files folder image Files (6)  
File Role Description
Files folder imagesrc (1 directory)
Accessible without login Plain text file .travis.yml Data Auxiliary data
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file LICENSE.txt Doc. Documentation
Accessible without login Plain text file phpunit.xml Data Auxiliary data
Accessible without login Plain text file README.md Doc. Documentation

  Files folder image Files (6)  /  src  
File Role Description
Files folder imageHttp (1 directory)

  Files folder image Files (6)  /  src  /  Http  
File Role Description
Files folder imageData (1 directory)

  Files folder image Files (6)  /  src  /  Http  /  Data  
File Role Description
Files folder imageCollection (1 file)

  Files folder image Files (6)  /  src  /  Http  /  Data  /  Collection  
File Role Description
  Plain text file Sorcerer.php Class Class source

The PHP Classes site has supported package installation using the Composer tool since 2013, as you may verify by reading this instructions page.
Install with Composer Install with Composer
 Version Control Unique User Downloads Download Rankings  
 100%
Total:141
This week:0
All time:9,196
This week:524Up