How to Grab a String from This to That from a Webpage's Source?

user2581346

How to grab a string from this to that from a webpage's source? I have looked all over PHP.net and I was not able to figure out if PHP had a function or set of functions that can grab a string from this to that.

For example, this is what I currently have (and I want to grab everything from "wgCategories" to "wgMonthNamesShort") from the webpage stored in $html:

<?php
error_reporting(E_ALL);
$html = file_get_contents('http://en.wikipedia.org/wiki/Los_Angeles');
$string = <>;
?>

First, I grabbed the webpage's source into the $html variable. Now I need a function or set of functions that can grab everything from "wgCategories" to "wgMonthNamesShort" and store it into $string.

Desired result:

$string = "wgCategories":["All articles with dead external links","Articles with dead external links from March 2013","Articles with dead external links from March 2014","Pages with broken reference names","Articles with dead external links from January 2014","Articles with dead external links from September 2011","Articles with dead external links from October 2011","CS1 errors: dates","Use mdy dates from May 2014","Wikipedia indefinitely semi-protected pages","Wikipedia indefinitely move-protected pages","Coordinates on Wikidata","Articles including recorded pronunciations","Articles containing Spanish-language text","All articles with unsourced statements","Articles with unsourced statements from December 2013","Spoken articles","Articles with hAudio microformats","Los Angeles, California","Cities in Los Angeles County, California","Communities on U.S. Route 66","County seats in California","Incorporated cities and towns in California","Populated coastal places in California","Populated places established in 1781","Port cities and towns of the United States Pacific coast","Butterfield Overland Mail in California","Stockton - Los Angeles Road"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort";

Lastly, please note that everything from "wgCategories" to "wgMonthNamesShort" is stored in between <script> tags (Not sure if this is important, but someone told me it is worth mentioning).

Let me know if clarification is needed.

anubhava

You can use preg_match with s flag (DOTALL) to grabe string between 2 keywords:

error_reporting(E_ALL);
$html = file_get_contents('http://en.wikipedia.org/wiki/Los_Angeles');
if (preg_match('/wgCategories.*?wgMonthNamesShort/is', $html, $matches))
   echo $matches[0];

You can avoid regex and do that using PHP string functions also like stristr.

Above code prints:

wgCategories":["All articles with dead external links","Articles with dead external links from March 2013","Articles with dead external links from March 2014","Pages with broken reference names","Articles with dead external links from January 2014","Articles with dead external links from September 2011","Articles with dead external links from October 2011","CS1 errors: dates","Use mdy dates from May 2014","Wikipedia indefinitely semi-protected pages","Wikipedia indefinitely move-protected pages","Coordinates on Wikidata","Articles including recorded pronunciations","Articles containing Spanish-language text","All articles with unsourced statements","Articles with unsourced statements from December 2013","Spoken articles","Articles with hAudio microformats","Los Angeles, California","Cities in Los Angeles County, California","Communities on U.S. Route 66","County seats in California","Incorporated cities and towns in California","Populated coastal places in California","Populated places established in 1781","Port cities and towns of the United States Pacific coast","Butterfield Overland Mail in California","Stockton - Los Angeles Road"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Source a vimrc from a webpage?

From Dev

How to grab text from a messy string in java?

From Dev

How To Grab Image Source From CSS Background Image

From Dev

Time String from Webpage

From Dev

c# How to grab string from inside <b> that's inside a div class

From Dev

How to extract data from HTML page source of (a tab within) a webpage?

From Dev

how to get textbox contents from a webpage using its loaded source

From Dev

Grab Amount from Specific String

From Dev

grab just the cents from string

From Dev

grab just the cents from string

From Dev

How to grab URL from string with specified alt tags

From Dev

How to grab ObjectId from object in parse and present it as string in swift

From Dev

How to grab URL from string with specified alt tags

From Dev

Regex from a html parsing, how do I grab a specific string?

From Dev

How can I grab a group of words from a string?

From Dev

How to grab values from unformatted string in ios objective-c?

From Dev

Grab data from webpage excel vba with multiple innertext

From Dev

Getting string from webpage in Android

From Dev

Reading source code from a webpage in java

From Dev

How to load webpage from a string of html code in JavaFX webviewer?

From Dev

How can I extract a list of id's from a webpage?

From Dev

Best way to grab a specific position from a string

From Dev

Grab random sequence of words from a string?

From Dev

grab value from string in reverse order

From Dev

regex to grab from char/string to another

From Dev

Best way to grab a specific position from a string

From Dev

Regex to grab the data from an address string

From Dev

What's the correct regex to use for parsing data from a webpage source between tags?

From Dev

How to create a string resource from an external source?

Related Related

  1. 1

    Source a vimrc from a webpage?

  2. 2

    How to grab text from a messy string in java?

  3. 3

    How To Grab Image Source From CSS Background Image

  4. 4

    Time String from Webpage

  5. 5

    c# How to grab string from inside <b> that's inside a div class

  6. 6

    How to extract data from HTML page source of (a tab within) a webpage?

  7. 7

    how to get textbox contents from a webpage using its loaded source

  8. 8

    Grab Amount from Specific String

  9. 9

    grab just the cents from string

  10. 10

    grab just the cents from string

  11. 11

    How to grab URL from string with specified alt tags

  12. 12

    How to grab ObjectId from object in parse and present it as string in swift

  13. 13

    How to grab URL from string with specified alt tags

  14. 14

    Regex from a html parsing, how do I grab a specific string?

  15. 15

    How can I grab a group of words from a string?

  16. 16

    How to grab values from unformatted string in ios objective-c?

  17. 17

    Grab data from webpage excel vba with multiple innertext

  18. 18

    Getting string from webpage in Android

  19. 19

    Reading source code from a webpage in java

  20. 20

    How to load webpage from a string of html code in JavaFX webviewer?

  21. 21

    How can I extract a list of id's from a webpage?

  22. 22

    Best way to grab a specific position from a string

  23. 23

    Grab random sequence of words from a string?

  24. 24

    grab value from string in reverse order

  25. 25

    regex to grab from char/string to another

  26. 26

    Best way to grab a specific position from a string

  27. 27

    Regex to grab the data from an address string

  28. 28

    What's the correct regex to use for parsing data from a webpage source between tags?

  29. 29

    How to create a string resource from an external source?

HotTag

Archive