Two-way sorting/finding

Arokh

Centurion
Joined
Apr 11, 2006
Messages
124
I just recently disconvered the possibillity to sort an arraylist
of classes using an IComparer and it works perfectly.

But now I have another problem:
I have for example the class:
Code:
Public Class FileStrc
    Public FilePath as String
    Public Ed2k as String
End Class

To speed up the process to find the right class,
I need to find it sometimes by FilePath and sometimes by Ed2k.
Now using the IComparer only allows me to sort it in one way.

Is there a "simple", standard way to do this or do I have to program it myself?



If I have to do it myself:
How can I overwrite the sorting method the arraylist class provides?

My idea would be then to create two arrays which
contain the indeces in the right order and then use it to sort/find my class.
This also means I would have to update the arrays once a class is edited or added.

This should work or am I missing something?
 
One way is to implement the IComparer in a nested class like
Visual Basic:
Public Class FileStrc
    Public FilePath As String
    Public Ed2k As String

    Public Class FilePathComparer
        Implements IComparer

        Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements System.Collections.IComparer.Compare
            Dim s1 As FileStrc = DirectCast(x, FileStrc)
            Dim s2 As FileStrc = DirectCast(y, FileStrc)

            Return s1.FilePath.CompareTo(s2.FilePath)

        End Function
    End Class

    Public Class Ed2kComparer
        Implements IComparer

        Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements System.Collections.IComparer.Compare
            Dim s1 As FileStrc = DirectCast(x, FileStrc)
            Dim s2 As FileStrc = DirectCast(y, FileStrc)

            Return s1.Ed2k.CompareTo(s2.Ed2k)

        End Function
    End Class

End Class

This can then be used (once you have added error handling and such like anyway ;)) like this
Visual Basic:
Dim x As ArrayList

Dim fs As FileStrc
fs = New FileStrc
fs.Ed2k = "Test"
fs.FilePath = "Test"
x.Add(fs)

fs = New FileStrc
fs.Ed2k = "One"
fs.FilePath = "Three"
x.Add(fs)

fs = New FileStrc
fs.Ed2k = "Three"
fs.FilePath = "One"
x.Add(fs)

'Note how we tell the .Sort method which one to use...
x.Sort(New FileStrc.FilePathComparer)
x.Sort(New FileStrc.Ed2kComparer)

If you are using .Net then I would strongly recommend using generics to make this even easier - the new class would look like
Visual Basic:
Public Class FileStrc
    Public FilePath As String
    Public Ed2k As String

    Public Class FilePathComparer
        Implements IComparer(Of FileStrc)

        Public Function Compare(ByVal x As FileStrc, ByVal y As FileStrc) As Integer Implements System.Collections.Generic.IComparer(Of FileStrc).Compare

            Return x.FilePath.CompareTo(y.FilePath)

        End Function
    End Class

    Public Class Ed2kComparer
        Implements IComparer(Of FileStrc)

        Public Function Compare1(ByVal x As FileStrc, ByVal y As FileStrc) As Integer Implements System.Collections.Generic.IComparer(Of FileStrc).Compare

            Return x.Ed2k.CompareTo(y.Ed2k)

        End Function
    End Class

End Class

and the calling code would be
Visual Basic:
Dim x As New List(Of FileStrc)   'List(of ...) rather than ArrayList

Dim fs As FileStrc
fs = New FileStrc
fs.Ed2k = "Test"
fs.FilePath = "Test"
x.Add(fs)

fs = New FileStrc
fs.Ed2k = "One"
fs.FilePath = "Three"
x.Add(fs)

fs = New FileStrc
fs.Ed2k = "Three"
fs.FilePath = "One"
x.Add(fs)

'Note how we tell the .Sort method which one to use...
x.Sort(New FileStrc.FilePathComparer)
x.Sort(New FileStrc.Ed2kComparer)
 
I don't know if you misunderstood me or if I'm not getting the picture of the code:
So I apologize if I still talk about the problem altough its solved.

I'll try to explain it a bit more detailed:

As I said in my post I have to find the correct FileStrc in an array by either the FilePath or the Ed2k hashstring.
When I'm adding files to the array I have to make sure that there are no duplicates.
At first I simply looped through the array to find matches and then prevent those file from being added.
Since this takes too long I thought sorting and searching in a more clever way would solve my problem and it did.

Now the FileStrc will be filled with alot of data (I just simplified the Class) by pulling it from a DB.
Since I'm working with threads and do alot of stuff simultaneously,
I don't know which FileStrc is meant by the data pulled from the DB.
The only thing I have to find the corresponding FileStrc is the ed2k hash,
since they both contain the hash.
Now I also would like to "properly" search for the entry instead of just looping through the array.

Now to the example you've given me (correct me if I'm wrong):
I've played around with it a bit and it sorts "physically" the array,
meaning I can only take advantage of the sorted array in one way.
If I wanted to access the array sorted in the other way I would have to call the sort method again.

Now the sorting which I require (FilePath or Ed2k) is dependent on the users actions and
it is possible that I often have to "switch" between the different sort orders.
So it feels like that the great deal of sorting would defeat the purpose to make things faster.


Also I'm quiet interessted in the difference between Arraylist() and a List() since I can use them in the same way.
Is List() a more simpler, "lightweight" class than an ArrayList?
 
Second bit first ;) ArrayList isn't 'strongly typed' all it's methods accept and return Object as a data-type - this can lead to runtime erros (adding Option Strict On at the top of your source files can highlight the places where this is a problem).

List(Of ...) on the other hand is strongly typed and will ensure that you are only passing / receiving the correct data types.

If you wish to loop over the variables to remove duplicates then sorting the array is probably not the best. An alternate approach (especially if your class only contains two data items) could be to use a dictionary object.

Visual Basic:
        Dim d As New Dictionary(Of String, String)

        'Use as if it was d(Hash, FilePath)

        d.Add("One", "Three")
        d.Add("Two", "Two")
        d.Add("Three", "One")

        If d.ContainsKey("One") Then
            'Hash One was present
        End If

        If d.ContainsValue("One") Then
            'FilePath One was present
        End If
 
out of curiosity:
If I used List(of Object) would it deafeat the purpose of the List Class?
And I could use just as well an ArrayList or are there even more differences?

Unfortunately the FileStrc is pretty big:
It contains alot of informations about video files.
Like from which series it belongs to, Episodename, the group which released it etc,
and alot of information about which codec was used, the duration... I guess you get the picture.


I already started to write my own ArrayList by inheriting it and add some indexarrays to it.
Though it is not as easy as I hoped it to be he he.
I had alot of problems getting the search algorithm to work.

Here is what I already got:
Virtual ArrayList Class:
Visual Basic:
Public Class VArrayList : Inherits ArrayList
    Dim Indeces As New ArrayList 'ALIndex Class  Sorting Classes

    Public Sub AddIndex(ByVal IndexName As String, ByVal Comparer As IComparer) 'Add SortIndex to Indeces
        Dim Index As Int32 = Indeces.BinarySearch(IndexName, New IndexHierarchy)

        If Index < 0 Then 'Sorting name unique?
            Dim NewALIndex As New ALIndex
            With NewALIndex
                .Name = IndexName
                .Comparer = Comparer
                Dim I, J As Int32
                For I = 0 To Count - 1
                    .Sequence.Add(I) 'Seize array to ContentArray and fill it with Indexes (unsorted)
                Next

                Dim Buffer As Int32
                Dim Lowest As Int32
                '######Sort IndexArray to reflect correct sorted ContentArray#####
                For I = 0 To Count - 1
                    Lowest = I
                    For J = I To Count - 1
                        If .Comparer.Compare(Item(.Sequence(Lowest)), Item(.Sequence(J))) = 1 Then
                            Lowest = J
                        End If
                    Next
                    If Lowest > 0 Then
                        Buffer = .Sequence(I)
                        .Sequence(I) = .Sequence(Lowest)
                        .Sequence(Lowest) = Buffer
                    End If
                Next
                '#################################################################
            End With
            Indeces.Insert(Index Xor -1, NewALIndex) 'Add it to Indeces (Keep Indeces sorted)
        End If

    End Sub

    Public Function VAdd(ByVal Item As Object) As Int32 'AddItem to ContentArray and update sorting
        For I As Int32 = 0 To Indeces.Count - 1 'Go through all sortindeces
            With DirectCast(Indeces(I), ALIndex)
                Dim Index As Int32 = VSearch(Item, .Name) 'Search for Position in IndexArray

                If Index < 0 Then
                    .Sequence.Insert(Index Xor -1, Count) 'Outside of arraybound
                Else
                    .Sequence.Insert(Index, Count - 1) 'inside arraybound
                End If
            End With
        Next
        Add(Item) 'Add item to ContentArray
    End Function

    Public Overloads Function VSearch(ByVal Search As Object, ByVal IndexName As String) As Int32
        Dim Index As Int32 = Indeces.BinarySearch(IndexName, New IndexHierarchy)
        If Index < 0 Then Throw New System.Exception("Index not found")

        Dim IndexArray As List(Of Int32) = DirectCast(Indeces(Index), ALIndex).Sequence 'Get IndexArray
        Dim Comparer As IComparer = DirectCast(Indeces(Index), ALIndex).Comparer 'Get Comparer

        '######### Search algorithm ################################
        Dim Middle As Integer = CInt(Math.Truncate(Count / 2))
        Dim Result As Integer = Comparer.Compare(Item(IndexArray(Middle)), Search)
        Dim Bound As Integer = Count
        Do
            Bound = CInt(Math.Truncate((Bound - 1) / 2 + 0.5)) 'New Offset

            If Result = 1 Then 'Upper Half
                Middle = Middle - CInt(Math.Truncate(Bound / 2 + 0.5)) 'Substract offset
            ElseIf Result = -1 Then 'Lower Half
                If Middle + 1 >= Count Then Return (Middle + 1) Xor -1 'Greater than any entires
                Middle = Middle + CInt(Math.Truncate(Bound / 2 + 0.5)) 'Add offset
            Else
                Return IndexArray(Middle) 'Match
            End If

            Result = Comparer.Compare(Item(IndexArray(Middle)), Search) 'Compare

            If Bound = 1 Then 'Searched item not inside array
                If Result = 1 Then
                    Return Middle Xor -1
                ElseIf Result = -1 Then
                    Return (Middle + 1) Xor -1
                End If
            End If
        Loop
        '##########################################################
    End Function

    Private Class ALIndex
        Public Name As String 'Sorting Name
        Public Comparer As IComparer 'The way to sort
        Public Sequence As New List(Of Int32) 'Array with indexes in correct order
    End Class

    Public Class IndexHierarchy
        Implements IComparer

        Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements System.Collections.IComparer.Compare
            Dim IndexName As String = DirectCast(x, ALIndex).Name
            Dim SIndexName As String = DirectCast(y, String)

            Return IndexName.CompareTo(SIndexName)
        End Function
    End Class

End Class

Usage:
Visual Basic:
Dim VirtArrayList As New VArrayList
For I as Int32=0 to 20 Step 2 'Fill Array
    VirtArrayList.Add(I)
Next
VirtArrayList.AddIndex("Number", New Heirarchy) 'Add Sort Oder to Array (as numbers)
VirtArrayList.AddIndex("String", New StringHeirarchy) 'Add Sort Oder to Array (as strings)

Dim Index as Int32 = VirtArrayList.VSearch(5, "Number") 'Get index using "Number" search order 
If Index < 0 Then VirtArrayList.VAdd(5) 'add item if it is not inside array, while keeping arrayindeces sorted

Public Class StringHeirarchy
    Implements IComparer

    Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements System.Collections.IComparer.Compare
        Dim xNew As String = CStr(DirectCast(x, Integer))
        Dim yNew As String = CStr(DirectCast(y, Integer))

        Return xNew.CompareTo(yNew)
    End Function
End Class
Public Class Heirarchy
    Implements IComparer
    Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements System.Collections.IComparer.Compare
        Dim xNew As Integer = DirectCast(x, Integer)
        Dim yNew As Integer = DirectCast(y, Integer)

        Return xNew.CompareTo(yNew)
    End Function
End Class

Of course the "String" sort order will return the same indexes as the "Number" sort order,
I just added it to show the possibility to add different sort orders.
This allows me to access both sort orders without having to sort first

I'm still interested if there is a simpler way to achive this and if not,
any optimization to the code is appreciated.
 
Last edited:
Not had time to have a proper look at your code yet - will try to do so later today though....

List(Of Object) is a bit of a pointless construct in most cases as it pretty much does the same as ArrayList.

In your case you might find some mileage in using generics to replace some of the code bits
e.g.
Visual Basic:
Dim Indeces As New ArrayList 'ALIndex Class  Sorting Classes    
'could become
Dim Indeces As New List(Of ALIndex)
'this would remove a lot of the casting currently needed in the rest of the class.
also the ALIndex class and it's comparer could benefit
Visual Basic:
    Public Class ALIndex
        Public Name As String 'Sorting Name
        Public Comparer As IComparer 'The way to sort
        Public Sequence As New List(Of Int32) 'Array with indexes in correct order
    End Class

    Public Class IndexHierarchy
        Implements IComparer(Of ALIndex)

        Public Function Compare1(ByVal x As VArrayList(Of T).ALIndex, ByVal y As VArrayList(Of T).ALIndex) As Integer Implements System.Collections.Generic.IComparer(Of VArrayList(Of T).ALIndex).Compare
            Return x.Name.CompareTo(y.Name)
        End Function
    End Class

Out of interest - it looks like your arraylist derivative is maintaining an internal arraylist and a seperate set of indexes into this arraylist for each field you need to search / sort on. Is that correct?
 
Out of interest - it looks like your arraylist derivative is maintaining an internal arraylist and a seperate set of indexes into this arraylist for each field you need to search / sort on. Is that correct?
That sound about correct.
I'm using the Array provided by the ArrayList Class for the actual content and for the Indexes I created another ArrayList
which contains the fieldindexes of the contentarray needed to have a correctly sorted array.

For example:
Index1: Sorted as Numbers
Index2: Sorted as Strings
Code:
	ContentArray:	Index1:		Index2:
0	1		0		0
1	20		2		3
2	2		4		2
3	100		1		1
4	4		3		4

Onto the ArrayList, List subject.

I tried replacing the ArrayList with List(ALIndex) but it has one major drawback:
When I use Binarysearch on it, it only allows the search parameter also to be a ALIndex Class,
but it is much more convenient to only enter the Name (String) of the ALIndex Class.

I also tried to change the Inherited class to List instead of ArrayList,
but there I also have to set the type of the list.
Is there a way to set the type not until I create a new instance of the class?
 
How large a collection of objects do you have? Does an unsorted list of keys actually perform too slow and you really need them to be sorted or are you attempting to prevent future slowdowns?

It might be easier to maintain a single list of your FileStrc objects and have a couple of dictionaries that maintain the key / object value (or key / index values). This could potentially remove the need for you to do any sorting or similar. You could make this easier still by providing an Add method that accepts a FileStrc class plus a FilePath and Ed2K as additional parameters.

You would also need to provide methods to retreive and remove based on either a FilePath or Ed2K which would need to remove entried from all the internal lists but that wouldn't be a large chore (not compared to your current method anyway).
 
It might be easier to maintain a single list of your FileStrc objects and have a couple of dictionaries that maintain the key / object value (or key / index values).
I think this is exactly what I'm doing (I hope I didn't missunderstood).
I have only one array where I store the FileStrc Classes (no sorting done there) and
I have an array which contains the ALIndex Classes which in turn contains
the index values ordered with help of the IComparer which is also stored in the ALIndex Class
(Each ALIndex in the Indeces array has an own sorting scheme and indexlist).
The only difference is that I'm using a List(Of 32) Class instead of a dictionary.

I'm assuming you mean a dictionary(Of Int32, Int32) with the Key being 0..1..2..3..n and
the Value being the index values to the content array,
but then again it would make things more complicated,
since when adding an item I would have to insert the item and
edit the Key entries in it so there aren't any duplicates.
So I guess I didn't understand you there.

How large a collection of objects do you have? Does an unsorted list of keys actually perform too slow and you really need them to be sorted or are you attempting to prevent future slowdowns?
Mostly there are only a few objects in the list (around 20 or so) but there are some cases it might go up to 600 objects.
It took about 6 seconds to add 1280 Files to the Array (sorted) and 32 seconds without sorting.
(Although it was not just adding the FileStrc Classes to the array, some other operations were involved too)

The FileStrc Class looks like this:
Visual Basic:
#Region "Var Structures"
    Class FileStrc
        Public FilePath As New FilePathStrc
        Public OldFilePath As New FilePathStrc
        Public ProcessedOn As New DateTime
        Public DBInfo As New DBStrc
        Public MyList As New MyListStrc
        Public Info As New FileInfoStrc
        Public State As New StateStrc
    End Class

    Class DBStrc
        Public Response As Boolean
        Public FId As Int32
        Public EId As Int32
        Public AId As Int32
        Public GId As Int32
        Public LId As Int32
        Public State As UInt16
        Public GroupLong As String
        Public GroupShort As String
        Public EpNo As String
        Public EpName As New LanguageStrc
        Public Type As String
        Public SeriesName As New LanguageStrc
        Public SeriesNameShort As String
        Public SeriesNameOther As String
        Public Synonym As String
    End Class

    Class LanguageStrc
        Public English As String
        Public Romaji As String
        Public Kanji As String
    End Class

    Class FilePathStrc
        Public Path As String
        Public FileName As String
        Public Property FilePath() As String
            Get
                Return Path & "\" & FileName
            End Get
            Set(ByVal NewPath As String)
                Path = My.Computer.FileSystem.GetParentPath(NewPath)
                FileName = My.Computer.FileSystem.GetName(NewPath)
            End Set
        End Property
    End Class

    Class MyListStrc
        Public State As New MLStateEnum
        Public Storage As String
        Public Add As New AddEnum
        Public Watched As Boolean
        Public Source As String
        Public Other As String
        Public Vote As Single
    End Class

    Class FileInfoStrc
        Public CreationDate As New DateTime
        Public Duration As Double
        Public App As String
        Public Library As String
        Public Extension As String
        Public Size As UInt64
        Public OverHead As Int32
        Public Hashes As New HashesStrc
        Public Title As String
        Public Video As New VideoStrc
        Public Audio As New System.Collections.ArrayList 'AudioStrc
        Public DefaultAudio As Byte
        Public Subtitles As New System.Collections.ArrayList ' SubtitleStrc
        Public DefaultSubtitle As Byte
    End Class

    Class HashesStrc
        Public crc As String
        Public ed2k As String = ""
        Public md5 As String
        Public sha1 As String
        Public tth As String
    End Class

    Class VideoStrc
        Public Frames As Int32
        Public Duration As Double
        Public FPS As Single
        Public Resolution As New Point
        Public AspectRatio As New Point
        Public Codec As New CodecStrc
        Public Identifier As String
        Public Size As Int32
        Public Title As String
        Public Chroma As String
        Public Struc As String
        Public Encoder As String
    End Class

    Class AudioStrc
        Public Size As New Int32
        Public Duration As Double
        Public Language As String
        Public Identifier As String
        Public Channel As Int16
        Public SamplingRate As Int32
        Public Samples As Int32
        Public Title As String
        Public Mode As String
    End Class

    Class SubtitleStrc
        Public Size As Int32
        Public Language As String
        Public Identifier As String
        Public Title As String
    End Class

    Class CodecStrc
        Public Name As String
        Public Settings As String
        Public Stats As String
    End Class

    Class StateStrc
        Public Reason As String
        Public State As New StateEnum
    End Class

    Public Enum AddEnum
        NoAction
        AddToMyList
        InMyList
        NotFound
    End Enum

    Public Enum StateEnum
        UnProcessed = 0
        Processing = 1
        Processed = 2
        Failed = 3
    End Enum

    Public Enum MLStateEnum
        Unknown = 0
        OnHDD = 1
        OnCd = 2
        Deleted = 3
        Share = 4
        Release = 5
    End Enum
#End Region

I've fixed some bugs in my class and added some new functions/properties:
See Attachment (I also added a class to show how it works, just call Usage.DoStuff somewhere)
 

Attachments

Just as a simpler idea would something along the lines of
Visual Basic:
Class VirtualArray2

    Dim Structs As New List(Of FileStrc)

    Dim FileIndex As SortedDictionary(Of String, Int32)
    Dim Ed2kIndex As SortedDictionary(Of String, Int32)

    Public Sub Add(ByVal fileStructure As FileStrc)
        Dim index As Int32 = Structs.Count
        Structs.Insert(index, fileStructure)

        FileIndex.Add(fileStructure.FilePath.Path, index)
        Ed2kIndex.Add(fileStructure.Info.Hashes.ed2k, index)

    End Sub

    Public Function GetByEd2k(ByVal hash As String) As FileStrc
        Dim i As Int32 = Ed2kIndex(hash)

        Return Structs(i)

    End Function

    Public Function HashExists(ByVal hash As String) As Boolean
        Return Ed2kIndex.ContainsKey(hash)
    End Function

    Public Sub RemoveByHash(ByVal hash As String)

        Dim i As Int32 = Ed2kIndex(hash)
        Dim f As FileStrc = Structs(i)
        If Not f Is Nothing Then
            Structs.RemoveAt(i)
            Ed2kIndex.Remove(hash)
            FileIndex.Remove(f.FilePath.Path)
        End If
    End Sub

End Class

not be suitable? Haven't tested it but it should allow you to maintain 2 internal indexes based on either the hash or filepath - you would just need to expand the example a bit.
 
Ahh now I understand your example.
I'll try that once I have time to code again.

And thanks for your patience. :)
 
Back
Top