Question

Compare strings and return words occuring in both

2

Hello everyone, I have a (for me) tricky question. For an issue at the project I am working on now, I have to create a large number of objects that contain aggregated information from a subset of objects. To make this more clear: We use products that align with a WBS number in SAP. Unfortunately, when setting up the SAP system, people were not aware that multiple products would align to the same WBS. For SAP this is a no go, when trying to process them at the same time. To address this, we create a new combined object based on the subset of original objects. Lets say we have a "Add power cable - high voltage" and a "Renew power cable - high voltage". (both align to WBS .006) To accomodate to SAP's requirement, I want to create a new object called "power cable - high voltage" This is based on the similar words in both original products, leaving the differences out. I have looked at the community commons but found no solution there. Is there a way to solve this issue?

asked 2019-02-28

Bart van den Heuvel

4 answers

Erwin 't Hoen · Answer 1 · 2019-02-28

In your example the ending of the strings is equal, but how do you determine that the two strings are equal enough?

Are they equal when one word is different, or is this different for different strings?

Once you determine the rule for strings to be equal enough you can setup a function that determines equal enough strings by searching and string manipulation. And you may be able to get to a point that the variants of the items can be related to the other item. If this is a consistent pattern you might create the function to e.g. remove the first word and so consolidate automatically. If there is little consistency then manual action might still be needed.

With java you can find the matching words in two strings and then determine the action you would like to take. The example below could help you get started:

import java.io.*;
class CommonWords
{
	public static void main (String args[])throws IOException
	{
		BufferedReader br=new BufferedReader(new InputStreamReader (System.in));
		int i,j,l1,l2,p,x,y;
		String str1, str2;
		char ch;

		System.out.print("Enter two sentences terminated by either a '?', '.' or '!' : ");
		str1 = br.readLine();
		str2 = br.readLine();

		l1= str1.length();
		l2= str2.length();

		String s1[] = new String[l1];
		x=0;
		p=0;
		//Store all the words of the first sentence in a string array
		for(i=0;i< l1;i++)
		{
			ch = str1.charAt(i);
			if(ch == ' ' || ch == '?' || ch == '.' || ch == '!'){
				s1[x++]=str1.substring(p,i);
				p = i+1;
			}
		}
		String s2[] = new String[l1];
		y=0;
		p=0;
		//Store all the words of the second sentence in a string array
		for(i=0;i< l2;i++)
		{
			ch = str2.charAt(i);
			if(ch == ' ' || ch == '?' || ch == '.' || ch == '!'){
				s2[y++]=str2.substring(p,i);
				p = i+1;
			}
		}

		//Now compare the words stored in two arrays
		for(i=0;i< x;i++){
			for(j=0;j< y;j++){
				//If match is found, print the word and store blank space to avoid repetition
				if(s1[i].equalsIgnoreCase(s2[j]) && !s2[j].equals(" ")){
					System.out.println(s1[i]);
					s2[j]=" ";
				}
			}
		}


	}
}

Paul Moes TimeSeries · Answer 2 · 2019-02-28

you could wrap this in a java action: https://karussell.wordpress.com/2011/04/14/longest-common-substring-algorithm-in-java/

𝕿𝖎𝖒 𝖛𝖆𝖓 𝕾𝖙𝖊𝖊𝖓𝖇𝖊𝖗𝖌𝖊𝖓 · Answer 3 · 2019-03-01

I highly recommend you to create a mapping table, making it a laborious task, but simple, traceble and predictable:

Add power cable - high voltage → Output: power cable - high voltage

Renew power cable - high voltage → Output: power cable - high voltage

Leggen Kabel (LS) → Output: Kabel (LS)

Verwijderen Kabel (LS) → Output: Kabel (LS)

Vervangen Kabel (LS) → Output: Kabel (LS)

Leggen Kabel (LS) → Output: Kabel (LS)

Verwijderen Kabel groen (LS) → Output: Kabel (LS)

Vervangen Kabel met een leuk patroon (LS) → Output: Kabel (LS)

If you make a custom function for this, than it will be hard to ensure that you will not get some unexpected result in one or two of the objects in your large dataset. The fact that you will be processing a large number of objects, should not stop you from this, since you need be certain of your results. You don’t want an object getting grouped to a wrong WBS-element with unknown financial concequenses in SAP.

If you decide to create a custom function, you should extensively test it.

Martin Koelewijn · Answer 4 · 2019-03-01

I’d try something like this: create a microflow that receives a list of objects with your string attribute. Use java action StringSplit (in CommunityCommons) with space as delimiter on the first. You then have a list of SplitItems. Then loop through the others to do the same StringSplit and in a nested loop loop through your first list of SplitItems to see if the words also occur in the new list. If not, remove from the first list. In the end the first list should only have SplitItems (words) that occur in all input objects. Then concatenate them back into a string.