How to run this example?If you are

How to run this example?

If you are using the graphical interface, (1) choose the "Apriori" algorithm, (2) select the input file "contextPasquier99.txt", (3) set the output file name (e.g. "output.txt") (4) set minsup to 40% and (5) click "Run algorithm".
If you want to execute this example from the command line, then execute this command:
java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40% in a folder containing spmf.jar and the example input file contextPasquier99.txt.
If you are using the source code version of SPMF, launch the file "MainTestApriori_saveToMemory.java" in the package ca.pfv.SPMF.tests.
What is Apriori?

Apriori is an algorithm for discovering frequent itemsets in transaction databases. It was proposed by Agrawal & Srikant (1993).

What is the input of the Apriori algorithm?

The input is a transaction database (aka binary context) and a threshold named minsup (a value between 0 and 100 %).

A transaction database is a set of transactions. Each transaction is a set of items. For example, consider the following transaction database. It contains 5 transactions (t1, t2, ..., t5) and 5 items (1,2, 3, 4, 5). For example, the first transaction represents the set of items 1, 3 and 4. This database is provided as the file contextPasquier99.txt in the SPMF distribution. It is important to note that an item is not allowed to appear twice in the same transaction and that items are assumed to be sorted by lexicographical order in a transaction.

Transaction id Items
t1 {1, 3, 4}
t2 {2, 3, 5}
t3 {1, 2, 3, 5}
t4 {2, 5}
t5 {1, 2, 3, 5}
What is the output of the Apriori algorithm?

Apriori is an algorithm for discovering itemsets (group of items) occurring frequently in a transaction database (frequent itemsets). A frequent itemset is an itemset appearing in at least minsup transactions from the transaction database, where minsup is a parameter given by the user.

For example, if Apriori is run on the previous transaction database with a minsup of 40 % (2 transactions), Apriori produces the following result:

itemsets support
{1} 3
{2} 4
{3} 4
{5} 4
{1, 2} 2
{1, 3} 3
{1, 5} 2
{2, 3} 3
{2, 5} 4
{3, 5} 3
{1, 2, 3} 2
{1, 2, 5} 2
{1, 3, 5} 2
{2, 3, 5} 3
{1, 2, 3, 5} 2
How should I interpret the results?

In the results, each itemset is annotated with its support. The support of an itemset is how many times the itemset appears in the transaction database. For example, the itemset {2, 3 5} has a support of 3 because it appears in transactions t2, t3 and t5. It is a frequent itemset because its support is higher or equal to the minsup parameter.
Input file format

The input file format for Apriori is defined as follows. It is a text file. An item is represented by a positive integer. A transaction is a line in the text file. In each line (transaction), items are separated by a single space. It is assumed that all items within a same transaction (line) are sorted according to a total order (e.g. ascending order) and that no item can appear twice within the same line.

For example, for the previous example, the input file is defined as follows:

1 3 4
2 3 5
1 2 3 5
2 5
1 2 3 5

Note that it is also possible to use the ARFF format as an alternative to the default input format. The specification of the ARFF format can be found here. Most features of the ARFF format are supported except that (1) the character "=" is forbidden and (2) escape characters are not considered. Note that when the ARFF format is used, the performance of the data mining algorithms will be slightly less than if the native SPMF file format is used because a conversion of the input file will be automatically performed before launching the algorithm and the result will also have to be converted. This cost however should be small.

Output file format

The output file format is defined as follows. It is a text file, where each line represents a frequent itemset. On each line, the items of the itemset are first listed. Each item is represented by an integer and it is followed by a single space. After, all the items, the keyword "#SUP:" appears, which is followed by an integer indicating the support of the itemset, expressed as a number of transactions. For example, here is the output file for this example. The first line indicates the frequent itemset consisting of the item 1 and it indicates that this itemset has a support of 3 transactions.

1 #SUP: 3
2 #SUP: 4
3 #SUP: 4
5 #SUP: 4
1 2 #SUP: 2
1 3 #SUP: 3
1 5 #SUP: 2
2 3 #SUP: 3
2 5 #SUP: 4
3 5 #SUP: 3
1 2 3 #SUP: 2
1 2 5 #SUP: 2
1 3 5 #SUP: 2
2 3 5 #SUP: 3
1 2 3 5 #SUP: 2

Note that if the ARFF format is used as input instead of the default input format, the output format will be the same except that items will be represented by strings instead of integers.

Performance

The Apriori algorithm is an important algorithm for historical reasons and also because it is a simple algorithm that is easy to learn. However, faster and more memory efficient algorithms have been proposed. If efficiency is required, it is recommended to use a more efficient algorithm like FPGrowth instead of Apriori. You can see a performance comparison of Apriori, FPGrowth, and other frequent itemset mining algorithms by clicking on the "performance" section of this website.

Implementation details

In SPMF, there is also an implementation of Apriori that uses a hash-tree as an internal structure to store candidates. This structure provide a more efficient way to count the support of itemsets. This version of Apriori is named "Apriori_with_hash_tree" in the GUI of SPMF and the command line. For the source code version, it can be run by executing the test file MainTestAprioriHT_saveToFile.java. This version of Apriori can be up to twice faster than the regular version in some cases but it uses more memory. This version of Apriori has two parameters: (1) minsup and (2) the number of child nodes that each node in the hash-tree should have. For the second parameter, we suggest to use the value 30.

Where can I get more information about the Apriori algorithm?

This is the technical report published in 1994 describing Apriori.

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. Research Report RJ 9839, IBM Almaden Research Center, San Jose, California, June 1994.

You can also read chapter 6 of the book "introduction to data mining" which provide a nice and easy to understand introduction to Apriori.

Example 2 : Mining Frequent Itemsets by Using the AprioriTid Algorithm

How to run this example?

If you are using the graphical interface, (1) choose the "Apriori_TID" algorithm , (2) select the input file "contextPasquier99.txt", (3) set the output file name (e.g. "output.txt") (4) set minsup to 40% and (5) click "Run algorithm".
If you want to execute this example from the command line, then execute this command:
java -jar spmf.jar run Apriori_TID contextPasquier99.txt output.txt 40% in a folder containing spmf.jar and the example input file contextPasquier99.txt.
If you are using the source code version of SPMF, launch the file "MainTestAprioriTID_saveToFile.java" in the package ca.pfv.SPMF.tests.
What is AprioriTID?

AprioriTID is an algorithm for discovering frequent itemsets (groups of items appearing frequently) in a transaction database. It was proposed by Agrawal & Srikant (1993).

AprioriTID is a variation of the Apriori algorithm. It was proposed in the same article as Apriori as an alternative implementation of Apriori. It produces the same output as Apriori. But it uses a different mechanism for counting the support of itemsets.

What is the input of the AprioriTID algorithm?

The input is a transaction database (aka binary context) and a threshold named minsup (a value between 0 and 100 %).

A transaction database is a set of transactions. Each transaction is a set of items. For example, consider the following transaction database. It contains 5 transactions (t1, t2, ..., t5) and 5 items (1,2, 3, 4, 5). For example, the first transaction represents the set of items 1, 3 and 4. This database is provided as the file contextPasquier99.txt in the SPMF distribution. It is important to note that an item is not allowed to appear twice in the same transaction and that items are assumed to be sorted by lexicographical order in a transaction.

Transaction id Items
t1 {1, 3, 4}
t2 {2, 3, 5}
t3 {1, 2, 3, 5}
t4 {2, 5}
t5 {1, 2, 3, 5}
What is the output of the AprioriTID algorithm?

AprioriTID is an algorithm for discovering itemsets (group of items) occurring frequently in a transaction database (frequent itemsets). A frequent itemset is an itemset appearing in at least minsup transactions from the transaction database, where minsup is a parameter given by the user.

For example, if AprioriTID is run on the previous transaction database with a minsup of 40 % (2 transactions), AprioriTID produces the following result:

itemsets support
{1} 3
{2} 4
{3} 4
{5} 4
{1, 2} 2
{1, 3} 3
{1, 5} 2
{2, 3} 3
{2, 5} 4
{3, 5} 3
{1, 2, 3} 2
{1, 2, 5} 2
{1, 3, 5} 2
{2, 3, 5} 3
{1, 2, 3, 5} 2
How should I interpret the results?

In the results, each itemset is annotated with its support. The support of an itemset is how many times the itemset appears in the transaction database. For example, the itemset {2, 3 5} has a support of 3 because it appears in transactions t2, t3 and t5. It is a frequent itemset because its support is higher or equal to the minsup parameter.

Input file format

The input file format used by AprioriTID is defined as follows. It is a text file. An item is represented by a positive integer. A transaction is a line in the text file. In each line (transaction), items are separated by a single space. It is assumed that all items within a same transaction (line) are sorted according to a total order (e.g. ascending order) and that no item can appear twice within the same line.

For example, for the previous example, the input file is defined as follows:

1 3 4
2 3 5
1 2 3 5

How to run this example?

If you are using the graphical interface, (1) choose the "Apriori" algorithm, (2) select the input file "contextPasquier99.txt", (3) set the output file name (e.g. "output.txt") (4) set minsup to 40% and (5) click "Run algorithm".
If you want to execute this example from the command line, then execute this command: 
java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40% in a folder containing spmf.jar and the example input file contextPasquier99.txt.
If you are using the source code version of SPMF, launch the file "MainTestApriori_saveToMemory.java" in the package ca.pfv.SPMF.tests.
What is Apriori?

Apriori is an algorithm for discovering frequent itemsets in transaction databases. It was proposed by Agrawal & Srikant (1993).

What is the input of the Apriori algorithm?

The input is a transaction database (aka binary context) and a threshold named minsup (a value between 0 and 100 %).

A transaction database is a set of transactions. Each transaction is a set of items. For example, consider the following transaction database. It contains 5 transactions (t1, t2, ..., t5) and 5 items (1,2, 3, 4, 5). For example, the first transaction represents the set of items 1, 3 and 4. This database is provided as the file contextPasquier99.txt in the SPMF distribution. It is important to note that an item is not allowed to appear twice in the same transaction and that items are assumed to be sorted by lexicographical order in a transaction.

Transaction id Items
t1 {1, 3, 4}
t2 {2, 3, 5}
t3 {1, 2, 3, 5}
t4 {2, 5}
t5 {1, 2, 3, 5}
What is the output of the Apriori algorithm?

Apriori is an algorithm for discovering itemsets (group of items) occurring frequently in a transaction database (frequent itemsets). A frequent itemset is an itemset appearing in at least minsup transactions from the transaction database, where minsup is a parameter given by the user.

For example, if Apriori is run on the previous transaction database with a minsup of 40 % (2 transactions), Apriori produces the following result:

itemsets support
{1} 3
{2} 4
{3} 4
{5} 4
{1, 2} 2
{1, 3} 3
{1, 5} 2
{2, 3} 3
{2, 5} 4
{3, 5} 3
{1, 2, 3} 2
{1, 2, 5} 2
{1, 3, 5} 2
{2, 3, 5} 3
{1, 2, 3, 5} 2
How should I interpret the results?

In the results, each itemset is annotated with its support. The support of an itemset is how many times the itemset appears in the transaction database. For example, the itemset {2, 3 5} has a support of 3 because it appears in transactions t2, t3 and t5. It is a frequent itemset because its support is higher or equal to the minsup parameter.
Input file format

The input file format for Apriori is defined as follows. It is a text file. An item is represented by a positive integer. A transaction is a line in the text file. In each line (transaction), items are separated by a single space. It is assumed that all items within a same transaction (line) are sorted according to a total order (e.g. ascending order) and that no item can appear twice within the same line.

For example, for the previous example, the input file is defined as follows:

1 3 4
2 3 5
1 2 3 5
2 5 
1 2 3 5

Note that it is also possible to use the ARFF format as an alternative to the default input format. The specification of the ARFF format can be found here. Most features of the ARFF format are supported except that (1) the character "=" is forbidden and (2) escape characters are not considered. Note that when the ARFF format is used, the performance of the data mining algorithms will be slightly less than if the native SPMF file format is used because a conversion of the input file will be automatically performed before launching the algorithm and the result will also have to be converted. This cost however should be small.

Output file format

The output file format is defined as follows. It is a text file, where each line represents a frequent itemset. On each line, the items of the itemset are first listed. Each item is represented by an integer and it is followed by a single space. After, all the items, the keyword "#SUP:" appears, which is followed by an integer indicating the support of the itemset, expressed as a number of transactions. For example, here is the output file for this example. The first line indicates the frequent itemset consisting of the item 1 and it indicates that this itemset has a support of 3 transactions.

1 #SUP: 3
2 #SUP: 4
3 #SUP: 4
5 #SUP: 4
1 2 #SUP: 2
1 3 #SUP: 3
1 5 #SUP: 2
2 3 #SUP: 3
2 5 #SUP: 4
3 5 #SUP: 3
1 2 3 #SUP: 2
1 2 5 #SUP: 2
1 3 5 #SUP: 2
2 3 5 #SUP: 3
1 2 3 5 #SUP: 2

Note that if the ARFF format is used as input instead of the default input format, the output format will be the same except that items will be represented by strings instead of integers.

Performance

The Apriori algorithm is an important algorithm for historical reasons and also because it is a simple algorithm that is easy to learn. However, faster and more memory efficient algorithms have been proposed. If efficiency is required, it is recommended to use a more efficient algorithm like FPGrowth instead of Apriori. You can see a performance comparison of Apriori, FPGrowth, and other frequent itemset mining algorithms by clicking on the "performance" section of this website.

Implementation details

In SPMF, there is also an implementation of Apriori that uses a hash-tree as an internal structure to store candidates. This structure provide a more efficient way to count the support of itemsets. This version of Apriori is named "Apriori_with_hash_tree" in the GUI of SPMF and the command line. For the source code version, it can be run by executing the test file MainTestAprioriHT_saveToFile.java. This version of Apriori can be up to twice faster than the regular version in some cases but it uses more memory. This version of Apriori has two parameters: (1) minsup and (2) the number of child nodes that each node in the hash-tree should have. For the second parameter, we suggest to use the value 30.

Where can I get more information about the Apriori algorithm?

This is the technical report published in 1994 describing Apriori.

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. Research Report RJ 9839, IBM Almaden Research Center, San Jose, California, June 1994.

You can also read chapter 6 of the book "introduction to data mining" which provide a nice and easy to understand introduction to Apriori.

Example 2 : Mining Frequent Itemsets by Using the AprioriTid Algorithm

How to run this example?

If you are using the graphical interface, (1) choose the "Apriori_TID" algorithm , (2) select the input file "contextPasquier99.txt", (3) set the output file name (e.g. "output.txt") (4) set minsup to 40% and (5) click "Run algorithm".
If you want to execute this example from the command line, then execute this command: 
java -jar spmf.jar run Apriori_TID contextPasquier99.txt output.txt 40% in a folder containing spmf.jar and the example input file contextPasquier99.txt.
If you are using the source code version of SPMF, launch the file "MainTestAprioriTID_saveToFile.java" in the package ca.pfv.SPMF.tests.
What is AprioriTID?

AprioriTID is an algorithm for discovering frequent itemsets (groups of items appearing frequently) in a transaction database. It was proposed by Agrawal & Srikant (1993).

AprioriTID is a variation of the Apriori algorithm. It was proposed in the same article as Apriori as an alternative implementation of Apriori. It produces the same output as Apriori. But it uses a different mechanism for counting the support of itemsets.

What is the input of the AprioriTID algorithm?

The input is a transaction database (aka binary context) and a threshold named minsup (a value between 0 and 100 %).

Transaction id Items
t1 {1, 3, 4}
t2 {2, 3, 5}
t3 {1, 2, 3, 5}
t4 {2, 5}
t5 {1, 2, 3, 5}
What is the output of the AprioriTID algorithm?

AprioriTID is an algorithm for discovering itemsets (group of items) occurring frequently in a transaction database (frequent itemsets). A frequent itemset is an itemset appearing in at least minsup transactions from the transaction database, where minsup is a parameter given by the user.

For example, if AprioriTID is run on the previous transaction database with a minsup of 40 % (2 transactions), AprioriTID produces the following result:

itemsets support
{1} 3
{2} 4
{3} 4
{5} 4
{1, 2} 2
{1, 3} 3
{1, 5} 2
{2, 3} 3
{2, 5} 4
{3, 5} 3
{1, 2, 3} 2
{1, 2, 5} 2
{1, 3, 5} 2
{2, 3, 5} 3
{1, 2, 3, 5} 2
How should I interpret the results?

Input file format

The input file format used by AprioriTID is defined as follows. It is a text file. An item is represented by a positive integer. A transaction is a line in the text file. In each line (transaction), items are separated by a single space. It is assumed that all items within a same transaction (line) are sorted according to a total order (e.g. ascending order) and that no item can appear twice within the same line.

For example, for the previous example, the input file is defined as follows:

1 3 4
2 3 5
1 2 3 5

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

How to run this example?If you are using the graphical interface, (1) choose the "Apriori" algorithm, (2) select the input file "contextPasquier99.txt", (3) set the output file name (e.g. "output.txt") (4) set minsup to 40% and (5) click "Run algorithm".If you want to execute this example from the command line, then execute this command: java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40% in a folder containing spmf.jar and the example input file contextPasquier99.txt.If you are using the source code version of SPMF, launch the file "MainTestApriori_saveToMemory.java" in the package ca.pfv.SPMF.tests.What is Apriori?Apriori is an algorithm for discovering frequent itemsets in transaction databases. It was proposed by Agrawal & Srikant (1993).What is the input of the Apriori algorithm?The input is a transaction database (aka binary context) and a threshold named minsup (a value between 0 and 100 %).A transaction database is a set of transactions. Each transaction is a set of items. For example, consider the following transaction database. It contains 5 transactions (t1, t2, ..., t5) and 5 items (1,2, 3, 4, 5). For example, the first transaction represents the set of items 1, 3 and 4. This database is provided as the file contextPasquier99.txt in the SPMF distribution. It is important to note that an item is not allowed to appear twice in the same transaction and that items are assumed to be sorted by lexicographical order in a transaction.
Transaction id Items
t1 {1, 3, 4}
t2 {2, 3, 5}
t3 {1, 2, 3, 5}
t4 {2, 5}
t5 {1, 2, 3, 5}
What is the output of the Apriori algorithm?

Apriori is an algorithm for discovering itemsets (group of items) occurring frequently in a transaction database (frequent itemsets). A frequent itemset is an itemset appearing in at least minsup transactions from the transaction database, where minsup is a parameter given by the user.

For example, if Apriori is run on the previous transaction database with a minsup of 40 % (2 transactions), Apriori produces the following result:

itemsets support
{1} 3
{2} 4
{3} 4
{5} 4
{1, 2} 2
{1, 3} 3
{1, 5} 2
{2, 3} 3
{2, 5} 4
{3, 5} 3
{1, 2, 3} 2
{1, 2, 5} 2
{1, 3, 5} 2
{2, 3, 5} 3
{1, 2, 3, 5} 2
How should I interpret the results?

In the results, each itemset is annotated with its support. The support of an itemset is how many times the itemset appears in the transaction database. For example, the itemset {2, 3 5} has a support of 3 because it appears in transactions t2, t3 and t5. It is a frequent itemset because its support is higher or equal to the minsup parameter.
Input file format

The input file format for Apriori is defined as follows. It is a text file. An item is represented by a positive integer. A transaction is a line in the text file. In each line (transaction), items are separated by a single space. It is assumed that all items within a same transaction (line) are sorted according to a total order (e.g. ascending order) and that no item can appear twice within the same line.

For example, for the previous example, the input file is defined as follows:

1 3 4
2 3 5
1 2 3 5
2 5
1 2 3 5

Note that it is also possible to use the ARFF format as an alternative to the default input format. The specification of the ARFF format can be found here. Most features of the ARFF format are supported except that (1) the character "=" is forbidden and (2) escape characters are not considered. Note that when the ARFF format is used, the performance of the data mining algorithms will be slightly less than if the native SPMF file format is used because a conversion of the input file will be automatically performed before launching the algorithm and the result will also have to be converted. This cost however should be small.

Output file format

The output file format is defined as follows. It is a text file, where each line represents a frequent itemset. On each line, the items of the itemset are first listed. Each item is represented by an integer and it is followed by a single space. After, all the items, the keyword "#SUP:" appears, which is followed by an integer indicating the support of the itemset, expressed as a number of transactions. For example, here is the output file for this example. The first line indicates the frequent itemset consisting of the item 1 and it indicates that this itemset has a support of 3 transactions.

1 #SUP: 3
2 #SUP: 4
3 #SUP: 4
5 #SUP: 4
1 2 #SUP: 2
1 3 #SUP: 3
1 5 #SUP: 2
2 3 #SUP: 3
2 5 #SUP: 4
3 5 #SUP: 3
1 2 3 #SUP: 2
1 2 5 #SUP: 2
1 3 5 #SUP: 2
2 3 5 #SUP: 3
1 2 3 5 #SUP: 2

Note that if the ARFF format is used as input instead of the default input format, the output format will be the same except that items will be represented by strings instead of integers.

Performance

The Apriori algorithm is an important algorithm for historical reasons and also because it is a simple algorithm that is easy to learn. However, faster and more memory efficient algorithms have been proposed. If efficiency is required, it is recommended to use a more efficient algorithm like FPGrowth instead of Apriori. You can see a performance comparison of Apriori, FPGrowth, and other frequent itemset mining algorithms by clicking on the "performance" section of this website.

Implementation details

In SPMF, there is also an implementation of Apriori that uses a hash-tree as an internal structure to store candidates. This structure provide a more efficient way to count the support of itemsets. This version of Apriori is named "Apriori_with_hash_tree" in the GUI of SPMF and the command line. For the source code version, it can be run by executing the test file MainTestAprioriHT_saveToFile.java. This version of Apriori can be up to twice faster than the regular version in some cases but it uses more memory. This version of Apriori has two parameters: (1) minsup and (2) the number of child nodes that each node in the hash-tree should have. For the second parameter, we suggest to use the value 30.

Where can I get more information about the Apriori algorithm?

This is the technical report published in 1994 describing Apriori.

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. Research Report RJ 9839, IBM Almaden Research Center, San Jose, California, June 1994.

You can also read chapter 6 of the book "introduction to data mining" which provide a nice and easy to understand introduction to Apriori.

Example 2 : Mining Frequent Itemsets by Using the AprioriTid Algorithm

How to run this example?

If you are using the graphical interface, (1) choose the "Apriori_TID" algorithm , (2) select the input file "contextPasquier99.txt", (3) set the output file name (e.g. "output.txt") (4) set minsup to 40% and (5) click "Run algorithm".
If you want to execute this example from the command line, then execute this command:
java -jar spmf.jar run Apriori_TID contextPasquier99.txt output.txt 40% in a folder containing spmf.jar and the example input file contextPasquier99.txt.
If you are using the source code version of SPMF, launch the file "MainTestAprioriTID_saveToFile.java" in the package ca.pfv.SPMF.tests.
What is AprioriTID?

AprioriTID is an algorithm for discovering frequent itemsets (groups of items appearing frequently) in a transaction database. It was proposed by Agrawal & Srikant (1993).

AprioriTID is a variation of the Apriori algorithm. It was proposed in the same article as Apriori as an alternative implementation of Apriori. It produces the same output as Apriori. But it uses a different mechanism for counting the support of itemsets.

What is the input of the AprioriTID algorithm?

The input is a transaction database (aka binary context) and a threshold named minsup (a value between 0 and 100 %).

A transaction database is a set of transactions. Each transaction is a set of items. For example, consider the following transaction database. It contains 5 transactions (t1, t2, ..., t5) and 5 items (1,2, 3, 4, 5). For example, the first transaction represents the set of items 1, 3 and 4. This database is provided as the file contextPasquier99.txt in the SPMF distribution. It is important to note that an item is not allowed to appear twice in the same transaction and that items are assumed to be sorted by lexicographical order in a transaction.

Transaction id Items
t1 {1, 3, 4}
t2 {2, 3, 5}
t3 {1, 2, 3, 5}
t4 {2, 5}
t5 {1, 2, 3, 5}
What is the output of the AprioriTID algorithm?

AprioriTID is an algorithm for discovering itemsets (group of items) occurring frequently in a transaction database (frequent itemsets). A frequent itemset is an itemset appearing in at least minsup transactions from the transaction database, where minsup is a parameter given by the user.

For example, if AprioriTID is run on the previous transaction database with a minsup of 40 % (2 transactions), AprioriTID produces the following result:

itemsets support
{1} 3
{2} 4
{3} 4
{5} 4
{1, 2} 2
{1, 3} 3
{1, 5} 2
{2, 3} 3
{2, 5} 4
{3, 5} 3
{1, 2, 3} 2
{1, 2, 5} 2
{1, 3, 5} 2
{2, 3, 5} 3
{1, 2, 3, 5} 2
How should I interpret the results?

In the results, each itemset is annotated with its support. The support of an itemset is how many times the itemset appears in the transaction database. For example, the itemset {2, 3 5} has a support of 3 because it appears in transactions t2, t3 and t5. It is a frequent itemset because its support is higher or equal to the minsup parameter.

Input file format

The input file format used by AprioriTID is defined as follows. It is a text file. An item is represented by a positive integer. A transaction is a line in the text file. In each line (transaction), items are separated by a single space. It is assumed that all items within a same transaction (line) are sorted according to a total order (e.g. ascending order) and that no item can appear twice within the same line.

For example, for the previous example, the input file is defined as follows:

1 3 4
2 3 5
1 2 3 5

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

Làm thế nào để chạy ví dụ này? Nếu bạn đang sử dụng giao diện đồ họa, (1) chọn "Apriori" thuật toán, (2) chọn các tập tin đầu vào "contextPasquier99.txt", (3) thiết lập tên tập tin đầu ra (ví dụ như "đầu ra. txt ") (4) bộ minsup đến 40% và (5) nhấn" Run thuật toán ". Nếu bạn muốn thực hiện ví dụ này từ dòng lệnh, sau đó thực hiện lệnh này: java -jar spmf.jar chạy Apriori contextPasquier99.txt đầu ra .txt 40% trong một thư mục có chứa spmf.jar và contextPasquier99.txt dụ tập tin đầu vào. Nếu bạn đang sử dụng phiên bản mã nguồn của SPMF, khởi chạy tập tin "MainTestApriori_saveToMemory.java" trong ca.pfv.SPMF.tests gói. Apriori là gì? Apriori là một thuật toán để phát hiện các tập phổ biến trong cơ sở dữ liệu giao dịch. Đó là đề xuất của Agrawal & Srikant (1993). đầu vào của thuật toán Apriori là gì? Đầu vào là một cơ sở dữ liệu giao dịch (context aka nhị phân) và một ngưỡng minsup tên (một giá trị từ 0 đến 100%). Một cơ sở dữ liệu giao dịch là một tập hợp các giao dịch. Mỗi giao dịch là một tập hợp của các mặt hàng. Ví dụ, hãy xem xét các cơ sở dữ liệu giao dịch sau. Nó chứa 5 giao dịch (t1, t2, ..., t5) và 5 mặt hàng (1,2, 3, 4, 5). Ví dụ, các giao dịch đầu tiên đại diện cho tập hợp các mục 1, 3 và 4. cơ sở dữ liệu này được cung cấp như các contextPasquier99.txt tập tin trong phân phối SPMF. Điều quan trọng cần lưu ý là một item không được phép xuất hiện hai lần trong cùng một giao dịch và các mặt hàng được cho là được sắp xếp theo thứ tự từ điển trong một giao dịch. id giao dịch Items t1 {1, 3, 4} {t2 2, 3, 5} t3 {1, 2, 3, 5} {2 t4, 5} t5 {1, 2, 3, 5} đầu ra của thuật toán Apriori là gì? Apriori là một thuật toán để phát hiện các tập phổ biến (nhóm các mặt hàng) xảy ra thường xuyên trong một cơ sở dữ liệu giao dịch (tập phổ biến). Một tập phổ biến là một tập phổ biến xuất hiện trong các giao dịch ít nhất minsup từ các cơ sở dữ liệu giao dịch, nơi minsup là một tham số được đưa ra bởi người sử dụng. Ví dụ, nếu Apriori đang chạy trên cơ sở dữ liệu giao dịch trước đó với một minsup 40% (2 giao dịch), Apriori sản xuất các kết quả sau: hỗ trợ tập phổ biến {1} 3 {2} 4 {3} 4 {5} 4 {1, 2} 2 {1, 3} 3 {1, 5} 2 {2, 3} 3 {2 , 5} 4 {3, 5} 3 {1, 2, 3} 2 {1, 2, 5} 2 {1, 3, 5} 2 {2, 3, 5} 3 {1, 2, 3, 5 } 2 Làm thế nào tôi nên giải thích kết quả? Trong các kết quả, mỗi tập phổ biến được chú thích với sự hỗ trợ của nó. Sự hỗ trợ của một tập phổ biến là bao nhiêu lần các itemset xuất hiện trong cơ sở dữ liệu giao dịch. Ví dụ, các itemset {2, 3 5} có một sự hỗ trợ của 3 vì nó xuất hiện trong các giao dịch t2, t3 và t5. Nó là một tập phổ biến bởi vì hỗ trợ của nó là cao hơn hoặc bằng với minsup tham số. định dạng tập tin đầu vào Các định dạng tập tin đầu vào cho Apriori được định nghĩa như sau. Nó là một tập tin văn bản. Một tiết mục được biểu diễn bởi một số nguyên dương. Một giao dịch là một dòng trong tập tin văn bản. Trong mỗi dòng (giao dịch), các mục được phân cách bởi một dấu cách trống. Nó được giả định rằng tất cả các mục trong cùng một giao dịch (line) đều được sắp xếp theo một trật tự toàn (ví dụ như thứ tự tăng dần) và không có mặt hàng có thể xuất hiện hai lần trong cùng một dòng. Ví dụ, đối với ví dụ trước đây, các tập tin đầu vào được định nghĩa như sau: 1 3 4 2 3 5 1 2 3 5 2 5 1 2 3 5 Lưu ý rằng nó cũng có thể sử dụng các định dạng ARFF như một thay thế cho định dạng đầu vào mặc định. Các đặc điểm kỹ thuật của các định dạng ARFF có thể được tìm thấy ở đây. Hầu hết các tính năng của các định dạng được hỗ trợ ARFF trừ rằng (1) nhân vật "=" bị cấm và (2) thoát khỏi nhân vật không được xem xét. Lưu ý rằng khi các định dạng ARFF được sử dụng, hiệu suất của các thuật toán khai thác dữ liệu sẽ thấp hơn một chút so với khi các định dạng tập tin SPMF bản địa được sử dụng bởi vì một sự chuyển đổi của tập tin đầu vào sẽ được tự động thực hiện trước khi tung ra thuật toán và kết quả cũng sẽ có được chuyển đổi. Chi phí này tuy nhiên phải nhỏ. định dạng tập tin đầu ra định dạng tập tin đầu ra được định nghĩa như sau. Nó là một tập tin văn bản, trong đó mỗi dòng đại diện cho một tập phổ biến. Trên mỗi dòng, các mục của các tập phổ biến được liệt kê đầu tiên. Mỗi mục được đại diện bởi một số nguyên và nó được theo sau bởi một không gian duy nhất. Sau đó, tất cả các mục, từ khóa "#SUP:" xuất hiện, tiếp theo là một số nguyên cho biết sự hỗ trợ của các tập phổ biến, được thể hiện một số giao dịch. Ví dụ, ở đây là các tập tin đầu ra ví dụ này. Dòng đầu tiên chỉ ra các tập phổ biến bao gồm các khoản 1 và nó chỉ ra rằng tập phổ biến này có một sự hỗ trợ của 3 giao dịch. #SUP 1: 3 2 #SUP: 4 3 #SUP: 4 5 #SUP: 4 1 2 #SUP: 2 1 3 #SUP: 3 1 5 #SUP: 2 2 3 #SUP: 3 2 5 #SUP: 4 3 5 #SUP: 3 1 2 3 #SUP: 2 1 2 5 #SUP: 2 1 3 5 #SUP : 2 2 3 5 #SUP: 3 1 2 3 5 #SUP: 2 Lưu ý rằng nếu định dạng ARFF được sử dụng như là đầu vào thay vì các định dạng đầu vào mặc định, định dạng đầu ra sẽ giống nhau, ngoại trừ mặt hàng sẽ được đại diện bởi chuỗi thay thế các số nguyên. Performance thuật toán Apriori Các là một thuật toán quan trọng cho lý do lịch sử và cũng bởi vì nó là một thuật toán đơn giản đó là dễ dàng để tìm hiểu. Tuy nhiên, nhanh hơn và bộ nhớ hiệu quả các thuật toán đã được đề xuất. Nếu hiệu quả là cần thiết, nó được khuyến khích để sử dụng một thuật toán hiệu quả hơn như FPGrowth thay vì Apriori. Bạn có thể thấy một so sánh hiệu suất của Apriori, FPGrowth, và các thuật toán khai thác tập phổ biến khác bằng cách nhấp vào "hiệu suất" của trang web này. chi tiết thực hiện Trong SPMF, đó cũng là một thực hiện Apriori mà sử dụng một hash-tree là một nội cấu trúc để lưu trữ các ứng cử viên. Cấu trúc này cung cấp một cách hiệu quả hơn để đếm sự hỗ trợ của tập phổ biến. Phiên bản này của Apriori được đặt tên là "Apriori_with_hash_tree" trong GUI của SPMF và các dòng lệnh. Đối với các phiên bản mã nguồn, nó có thể chạy bằng cách thực hiện các MainTestAprioriHT_saveToFile.java file test. Phiên bản này của Apriori có thể lên đến hai lần nhanh hơn so với phiên bản thông thường trong một số trường hợp, nhưng nó sử dụng nhiều bộ nhớ hơn. Phiên bản này của Apriori có hai tham số: (1) minsup và (2) số lượng các nút con mà mỗi nút trong hash-cây nên có. Đối với các tham số thứ hai, chúng tôi đề nghị sử dụng các giá trị 30. Tôi có thể lấy thêm thông tin về các thuật toán Apriori? Đây là báo cáo kỹ thuật được công bố vào năm 1994 mô tả Apriori. R. Agrawal và R. Srikant. Các thuật toán nhanh cho các luật kết hợp khai thác khoáng sản trong cơ sở dữ liệu lớn. . Báo cáo nghiên cứu RJ 9839, Trung tâm Nghiên cứu Almaden của IBM, San Jose, California, tháng 6 năm 1994 Bạn cũng có thể đọc chương 6 của cuốn sách "giới thiệu về khai thác dữ liệu" mà cung cấp một tốt đẹp và dễ hiểu giới thiệu về Apriori. Ví dụ 2: Khai thác thường xuyên tập phổ biến bằng cách sử dụng các thuật toán AprioriTid Làm thế nào để chạy ví dụ này? Nếu bạn đang sử dụng giao diện đồ họa, (1) chọn "Apriori_TID" thuật toán, (2) chọn các tập tin đầu vào "contextPasquier99.txt", (3) thiết lập các tập tin đầu ra tên (ví dụ: "output.txt") (4) thiết lập minsup đến 40% và (5) nhấn "Run thuật toán". Nếu bạn muốn thực hiện ví dụ này từ dòng lệnh, sau đó thực hiện lệnh này: java -jar spmf.jar chạy Apriori_TID contextPasquier99.txt output.txt 40% trong một thư mục có chứa spmf.jar và contextPasquier99.txt dụ tập tin đầu vào. Nếu bạn đang sử dụng phiên bản mã nguồn của SPMF, khởi chạy tập tin "MainTestAprioriTID_saveToFile.java" trong gói ca. pfv.SPMF.tests. AprioriTID là gì? AprioriTID là một thuật toán để phát hiện các tập phổ biến (các nhóm mặt hàng xuất hiện thường xuyên) trong một cơ sở dữ liệu giao dịch. Đó là đề xuất của Agrawal & Srikant (1993). AprioriTID là một biến thể của thuật toán Apriori. Nó đã được đề xuất trong cùng một điều như Apriori như một thực hiện thay thế của Apriori. Nó tạo ra các đầu ra tương tự như Apriori. Nhưng nó sử dụng một cơ chế khác nhau để đếm sự hỗ trợ của tập phổ biến. đầu vào của thuật toán AprioriTID là gì? Đầu vào là một cơ sở dữ liệu giao dịch (context aka nhị phân) và một tên là ngưỡng minsup (một giá trị từ 0 đến 100%). Một giao dịch cơ sở dữ liệu là một tập hợp các giao dịch. Mỗi giao dịch là một tập hợp của các mặt hàng. Ví dụ, hãy xem xét các cơ sở dữ liệu giao dịch sau. Nó chứa 5 giao dịch (t1, t2, ..., t5) và 5 mặt hàng (1,2, 3, 4, 5). Ví dụ, các giao dịch đầu tiên đại diện cho tập hợp các mục 1, 3 và 4. cơ sở dữ liệu này được cung cấp như các contextPasquier99.txt tập tin trong phân phối SPMF. Điều quan trọng cần lưu ý là một item không được phép xuất hiện hai lần trong cùng một giao dịch và các mặt hàng được cho là được sắp xếp theo thứ tự từ điển trong một giao dịch. id giao dịch Items t1 {1, 3, 4} {t2 2, 3, 5} t3 {1, 2, 3, 5} {2 t4, 5} t5 {1, 2, 3, 5} đầu ra của thuật toán AprioriTID là gì? AprioriTID là một thuật toán để phát hiện các tập phổ biến (nhóm các mặt hàng) xảy ra thường xuyên trong một cơ sở dữ liệu giao dịch (tập phổ biến). Một tập phổ biến là một tập phổ biến xuất hiện trong các giao dịch ít nhất minsup từ các cơ sở dữ liệu giao dịch, nơi minsup là một tham số được đưa ra bởi người sử dụng. Ví dụ, nếu AprioriTID đang chạy trên cơ sở dữ liệu giao dịch trước đó với một minsup 40% (2 giao dịch), AprioriTID sản xuất các kết quả sau: hỗ trợ tập phổ biến {1} 3 {2} 4 {3} 4 {5} 4 {1, 2} 2 {1, 3} 3 {1, 5} 2 {2, 3} 3 {2 , 5} 4 {3, 5} 3 {1, 2, 3} 2 {1, 2, 5} 2 {1, 3, 5} 2 {2, 3, 5} 3 {1, 2, 3, 5 } 2 Làm thế nào tôi nên giải thích kết quả? Trong các kết quả, mỗi tập phổ biến được chú thích với sự hỗ trợ của nó. Sự hỗ trợ của một tập phổ biến là bao nhiêu lần các itemset xuất hiện trong cơ sở dữ liệu giao dịch. Ví dụ, các itemset {2, 3 5} có một sự hỗ trợ của 3 vì nó xuất hiện trong các giao dịch t2, t3 và t5. Nó là một tập phổ biến bởi vì hỗ trợ của nó là cao hơn hoặc bằng với minsup tham số. định dạng tập tin đầu vào Các định dạng tập tin đầu vào được sử dụng bởi AprioriTID được định nghĩa như sau. Nó là một tập tin văn bản. Một tiết mục được biểu diễn bởi một số nguyên dương. Một giao dịch là một dòng trong tập tin văn bản. Trong mỗi dòng (giao dịch), các mục được phân cách bởi một dấu cách trống. Nó được giả định rằng tất cả các mục trong cùng một giao dịch (line) đều được sắp xếp theo một trật tự toàn (ví dụ như thứ tự tăng dần) và không có mặt hàng có thể xuất hiện hai lần trong cùng một dòng. Ví dụ, đối với ví dụ trước đây, các tập tin đầu vào được định nghĩa như sau: 1 3 4 2 3 5 1 2 3 5

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.