↑ Return to F30 Full Posts

F32 Rss grabber Problems

by
George Morgan
My articles
Follow on:

Page no: F32

 

 

Explanation
Videos and Pics

Selecting and filtering content

we have two possibilities to select and remove certain content:

We usually

Details on E42-E49

 

 

Description

Syndication blogs has different test cases and test data than the European. They have different b

Case1:

Ecomonigor blog, they have Google Goals codes

 example1: <a  onclick=”__gaTracker(‘send’, ‘event’, ‘outbound-article’, ‘http://www.economist.com/blogs/americasview/2014/08/argentina-and-holdouts’, ‘the Economist described the stand-off’);” href=”http://www.economist.com/blogs/americasview/2014/08/argentina-and-holdouts”>the Economist described the stand-off</a>
Case2:
FT Alphaville, title attribute before href
 example2: <a title=”Halliburton and Baker Hughes abandon $28bn tie-up – FT.com” href=”http://www.ft.com/intl/cms/s/0/a6389b4e-0fe0-11e6-839f-2922947098f0.html#axzz47SCJ9Ubf”>antitrust authorities</a>

So we retested all the functionalities and edit all our reg-ex to search not only in the beginning or in the end of the tag, but everywhere no matter the position.  This fixed all our bugs in the software no matter the test data.

differences-between-the-old-and-the-new-regex

Differences between the old and the new regex

PHP Libraries

In December 2020, we realized that many libraries are not compatible.

 
Example: XPath

Extracts the text from web page.


Rewrite the libraries.
Find the libraries, but this is old software.

 

See more for Autofeed