java - How to lock a code block based on a certain condition? -


edit: i've added table example (see google sheets link) , how resulting apple object should like.

i've programmed multi-threaded web scraper using jsoup, extracts information website , saves map. main thing can't work program not connect website if scraped information.

information program

it extracts information table on website , starts thread every word in table.

so threads started word class member. every thread has same concurrenthashmap object. plan check if word exists in map key.
if not, should connect website information word, add data , put in map afterwards.
if map contains word, thread should value map , add data it.

so main goal not connect website twice same word.

here relevant code snippets:

main class
starting thread every word in table. "element" contains word , url more information word.

for (element element : allrelevanttableelements) {     executorservice.execute(new worker(element, data, concurrentmap)); } 

worker class
1. check if word in map.
2a. if in map, add data it.
2b. if not in map, scrape information website , add data it.

public class worker implements runnable {  mywebscraper scraper; element element;     string data; concurrentmap<string, fruit> concurrentmap;  public worker(element element, string data, concurrentmap<string, fruit> concurrentmap) {     this.element = element;     this.data = data;     this.concurrentmap = concurrentmap; }  @override public void run() {      fruit fruit;      if (concurrentmap.containskey(element.text())) {          fruit = concurrentmap.get(element.text());         fruit.adddata(data)     } else {                     scraper = new webscraper("http://fruitinformation.com" + element.attr("href"));         scraper.connect();         fruit = scraper.getinformation();         fruit.adddata(data)     }      concurrentmap.put(element.text(), fruit); } } 

example
lets table looks this:

https://docs.google.com/spreadsheets/d/1jf8sh8sp9y0sv3xb5mlisgcjp5s_dhasp3kbnqla248/edit?usp=sharing

the main class start 3 threads:
thread 1: element contains "apple" , suburl "/apple",
data contains "1,20€"
thread 2: element contains "orange" , suburl "/orange",
data contains "2,40€"
thread 3: element contains "apple" , suburl "/apple",
data contains "1,50€"

the problem threads run simultaneously, thread 1 , 3 both check if "apple" in map , both false result. both connect website fruitinformation.com/apple , basic information apples need once. both add data returned object , put in map, thread 1 first "1,20€" , thread 2 overrides "1,20€" apple "1,50€ apple value.

however goal 1 apple thread connects website , adds data(for example 1,20€) , other 1 realizes apple object exists in map , adds data(1,50€) existing apple. fruit objects have lists that.
resulting map entry should this:
key=apple , value= fruit["apple", basicinformationfromwebsite, list["1,20€"; "1,50€"]]

the other thread (orange) should run totally unaffected this. different fruits should run simutaneously elements same fruit have respect each other somehow. there type of synchronization blocks instances same fruit names, doesnt block other instances?


i've read lot synchronization, locks, etc can't find solution problem.
nice if can me, in advance!

xy problem. synchronisation won't fix this. assuming implement it, second thread blocked first , proceed unwanted crawl.

you add set of words have begun processed, or add dummy element map shows being processed although not complete.


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -