Overview of Select
The Select
class in DBL is a powerful and versatile tool for data retrieval, manipulation, and querying. It offers SQL-like capabilities within the DBL environment, allowing developers to perform complex data operations with ease. The Select
class can be used in conjunction with the FOREACH
statement for iterating over data sets, or individually for more specific data operations.
Key components of Select
-
From class: Specifies the data source for the selection. It defines the file or database table to be queried and the structure of the records.
-
Where class: Used to set conditions for data selection. It allows filtering of records based on specified criteria, similar to the WHERE clause in SQL.
-
OrderBy class: Facilitates sorting of the retrieved data. It can order the data based on one or more fields, either in ascending or descending order.
-
GroupBy class: Enables grouping of records based on one or more fields. This is akin to the GROUP BY clause in SQL, providing a way to aggregate data.
-
Join functionality: Allows the combination of records from multiple data sources, akin to JOINs in SQL. This includes capabilities for inner joins and left outer joins.
-
Sparse class: Offers performance optimization, particularly for networked data access with xfServer. It allows partial record retrieval, reducing data transfer over the network.
An example
For starters, we’re going to need some data. Go grab the employee.txt
and employee.xdl
files from the data
folder in the companion repository. This is a simple text file with some employee data in it and a par file to tell fconvert
some info about our record. We’re going to use this to create a simple ISAM file. To do this, we’re going to use the fconvert
utility. This is a command line utility that comes with Synergy/DE. It’s used for creating and manipulating ISAM files. To create our file, make sure you’re in a dbl command prompt, and then we’re going to run the following command:
fconvert -i1 employees.txt -oi employee -d employee.xdl
It shouldn’t output anything to the screen if it worked correctly, but if you look in your project directory, you should see a new file named employee.ism
. This is our ISAM file. Now that we have our data, let’s write a program to read it. Create a new file named select.dbl
in your project directory and add the following code:
main
record emp_rec
EmployeeID, a5
Name, a30
Department, a20
Salary, d7.2
endrecord
proc
record
channel, i4 ; Channel for file access
fromObj, @From ; From object for Select
whereObj, @Where ; Where object for Select
selectObj, @Select
endrecord
; Open the employee file
open(channel=0, I:I, "employee.ism")
; Create a From object for the employee file
fromObj = new From(channel, emp_rec)
; Define a Where clause (e.g., selecting employees from a specific department)
whereObj = (Where)emp_rec.Department.eq."Sales"
; Create a Select object with the From and Where objects and
; an OrderBy clause (e.g., ordering by Salary)
selectObj = new Select(fromObj, whereObj, OrderBy.Descending(emp_rec.Salary))
; Iterate over the selected records
foreach emp_rec in selectObj
begin
; Process each record (e.g., display employee details)
Console.WriteLine( "ID: " + emp_rec.EmployeeID +
& ", Name: " + emp_rec.Name +
& ", Department: " + emp_rec.Department +
& ", Salary: " + %string(emp_rec.Salary))
end
; Close the file channel
close channel
end
Pretty easy right? You’ve already seen most of this stuff plenty of times before, so let’s just talk about a few of the interesting bits. The Where
clause lets you build your filter criteria using regular DBL syntax. Here are a few more valid Where
snippets to drive that home.
whereObj = (Where)emp_rec.Department == "Sales"
whereObj = (Where)emp_rec.Department == "Sales" && emp_rec.Salary.gt.100000
whereObj = (Where)emp_rec.Department.eq."Sales".and.
& emp_rec.Salary > 100000 || emp_rec.Salary < 50000
Notice that you can use the &&
and ||
operators to build more complex criteria. You can also use the eq
and gt
operators instead of ==
and >
. The choice of symbols or words is really a stylistic choice. As a comparison, the query in our program would look something like this in SQL:
SELECT * FROM employee WHERE Department = 'Sales' ORDER BY Salary DESC
From
The From
object acts as the primary binding link between the data area and the data source. When a From
object is created, it is bound to a specific data area, which is defined by a record, group, or structure within the program. This binding process essentially establishes a mapping between the data structure in the program and the corresponding structure of the data source. You will use this bound data area when defining your Where
and OrderBy
clauses, as well as when accessing the data retrieved from the data source. It may feel slightly odd to have the binding expressed in the From
object and also when iterating over the FOREACH loop, but if you follow the pattern of using the From
object to define the data area and then using the data area in the FOREACH loop, you’ll be fine. The From
object maintains a connection to the original data area throughout its lifecycle, and therefore, the data area you specify must live at least as long as the From
object you create.
Changing things in the data
The Select
class is not read only. If your file channel is opened for update, you can effectively use it to update and delete records. Let’s revisit our example from above and add a few lines to update the salary of all the employees in the sales department. First change the OPEN statement to the following:
open(channel=0, U:I, "employee.ism")
Now add the following code after the FOREACH loop:
; Give a 10% raise
emp_rec.Salary = emp_rec.Salary * 1.10
Select.GetEnum().Current = emp_rec
Now if you compile and run the program, you should see the salaries of all the employees in the sales department increase by 10%. If you keep running it, they will just keep going up! You can also use the Select
class to delete records. This one is the easiest of all. Whatever your Select
criteria is, you just have to call selectObj.Delete()
and it will delete all the records that match your criteria. Let’s write a new program to delete all the employees from the “CORP” department.
import Synergex.SynergyDE.Select
main
record emp_rec
EmployeeID, a5
Name, a30
Department, a20
Salary, d9.2
endrecord
record
channel, i4 ; Channel for file access
fromObj, @From ; From object for Select
whereObj, @Where ; Where object for Select
selectObj, @Select
endrecord
proc
; Open the employee file
open(channel=0, U:I, "employee.ism")
; Create a From object for the employee file
fromObj = new From(channel, emp_rec)
; Define a Where clause (e.g., selecting employees from a specific department)
whereObj = (Where)emp_rec.Department.eq."CORP"
; Create a Select object with the From and Where objects
; and an OrderBy clause (e.g., ordering by Salary)
selectObj = new Select(fromObj, whereObj)
Console.WriteLine("Deleted " + %string(selectObj.Delete()) +
& " employees from the CORP department")
close(channel)
endmain
When you run this program the first time, you should see the following output:
Deleted 12 employees from the CORP department
%DBR-S-STPMSG, STOP
If you run it again, it will delete 0 employees, because there are no more employees in the CORP department. The update and delete operations we’ve just shown would look like this in SQL:
UPDATE employee SET Salary = Salary * 1.10 WHERE Department = 'Sales'
DELETE FROM employee WHERE Department = 'CORP'
There are some important things to keep in mind when using Select
over a channel opened for update. First, you need to be aware that each record you read is going to be locked until you move to the next one or finish the loop. If you don’t intend to update records in the loop, you should use the Select
class over a channel opened for input only. Next you need to know how to handle locked records, because there could be someone else with a lock on a record you’re trying to read.
Events
To handle locked records, you can implement a class that derives from the Synergex.SynergyDE.Select.Event
class. This class has just one member, onLock
, which is called when a locked record is encountered by your Select
object. Let’s take a look at a basic implementation of this class:
; Define a class that extends the Event class for handling locks
class LockHandler extends Event
; Override the onLock method to handle the lock event
public override method onLock, boolean
inout lock, n
inout wait, n
in rec, a
in rfa, a
proc
; set wait to 1 to retry the locked operation
wait = 1
; Return true to retry the locked operation
; or false to throw an exception
mreturn true
endmethod
endclass
This class is pretty simple, but you can do a lot with those four arguments. You can use the lock
and wait
parameters to write out what you want the Select
class to do on retry. You can set wait
to 1 and then return true if you want to retry with a wait similar to WAIT: 1
in a READS statement. Since you get the contents of the record in rec
, you can use that to decide on your course of action. You can use the rfa
argument to get the RFA of the record that was locked. You can either store this RFA somewhere to try again later or offer a list of exceptions to the user, depending on what makes sense in your situation. If you want to skip the locked record, you can set lock
to Q_NO_LOCK
or Q_NO_DELETE
, depending on whether this is a read or a delete operation. You can register your event handler with the Select
object by calling selectObj.RegisterEvent(new LockHandler())
prior to starting your Delete()
or FOREACH loop.
N + 1 problem
The N + 1 query problem is a common performance issue in database operations that is particularly evident in scenarios where a loop over a set of records triggers additional queries for each record. This problem is especially pertinent in an environment using xfServer.
The N + 1 query problem occurs when an application makes one query to retrieve N records from a primary table (the “driving table”) and then iterates over these records, making an additional query for each one to retrieve related data from another table. This results in a total of N + 1 queries (1 for the initial fetch and N for each record).
Consider two related tables: Employees
(the driving table) and Departments
. A typical operation might involve fetching all employees and then, for each employee, fetching data from the Departments
table to get department details.
An initial READS
operation retrieves each employee record from the Employees
table. Inside a loop over these employee records, a READ operation is used to fetch the corresponding department information for each employee from the Departments
table. Take a look at this diagram showing the flow of operations:
The approach described above results in one query for the driving table plus one additional query for each record in the driving table. This means if there are 100 employees, the system performs 1 (initial READS) + 100 (individual READs) = 101 database queries. Such a high number of queries can significantly degrade performance, particularly in scenarios with large datasets or over networked database systems like xfServer. Let’s take a look at the solution to this problem.
Addressing the problem with Select Join
- Efficiency of Joins: The
Select
class withJoin
functionality in DBL offers a more efficient way to handle related data from multiple tables. Instead of multiple queries, a singleSelect
statement withJoin
can retrieve all the necessary data in one go. - Reduced queries: This method significantly reduces the number of queries sent to the database, alleviating the performance issues associated with the N + 1 query problem.
- Optimized data retrieval:
Select
with Join optimizes data retrieval, making it particularly beneficial for applications that require data from multiple related tables.
To run this next example, you’ll need to grab the department.txt
and department.xdl
files from the data
directory in the companion repository. These are the same as the employee.txt
and employee.xdl
files we used earlier, but for the departments table. We’re going to use these to create a second ISAM file. Just like before, make sure you’re in a dbl command prompt, and then we’re going to run the following command:
fconvert -i1 departments.txt -oi department -d department.xdl
Now let’s write a program to read both of these files at the same time. Create a new file named join.dbl
in your project directory and add the following code:
import Synergex.SynergyDE.Select
main
record employee
EmployeeID, a5
Name, a30
Department, a20
Salary, d9.2
endrecord
record department
DepartmentID, a20
Description, a2000
endrecord
record
channelEmp, i4 ; Channel for employee file
channelDept, i4 ; Channel for department file
joinSelectObj, @JoinSelect
rowObj, @Rows
endrecord
proc
; Open employee and department files
open(channelEmp, I:I, "DAT:employees.ism")
open(channelDept, I:I, "DAT:departments.ism")
; Create From objects for both files
fromEmp = new From(channelEmp, employee)
fromDept = new From(channelDept, department)
; Create a Select object with an Inner Join
joinSelectObj = new Select(fromEmp.InnerJoin(fromDept,
& (On)(employee.Department == department.DepartmentID))).Join()
; Iterate over the joined records
foreach rowObj in joinSelectObj
begin
rowObj.fill(employee) ; Fill employee record
rowObj.fill(department) ; Fill department record
; Process the joined records (e.g., display details)
Console.WriteLine( "ID: " + employee.EmployeeID +
& ", Name: " + employee.Name +
& ", Department: " + department.DepartmentID +
& ", Department Description: " + %atrim(department.Description))
end
; Close the file channels
close channelEmp
close channelDept
endmain
The condition (employee.Department == department.DepartmentID
) dictates how the rows from the two tables are matched. It essentially says, “Link each employee to their respective department by matching the department ID in the employee record with the department ID in the department record.” Only those employee records that have a corresponding department entry will be included in the result set. When using Select Join, the record that is being matched must use a key. In this case the department.xdl
file specifies DepartmentID
as a key. If you were writing this query in SQL, it would look something like this:
SELECT *
FROM Employees
INNER JOIN Departments
ON Employees.Department = Departments.DepartmentID;
The Select class offers two types of joins: InnerJoin
and LeftJoin
. Let’s take a look at the differences between these two types of joins.
InnerJoin
- Definition: An
InnerJoin
combines rows from different files where there are matching values in both files. - Behavior: If a row in the first file does not have a corresponding match in the second file, it is excluded from the result set.
- Use case: Use
InnerJoin
when you only want to retrieve records that have matching data in both files. - SQL example:
SELECT * FROM table1 INNER JOIN table2 ON table1.id = table2.id;
- Result: Only returns rows where there is a match in both
table1
andtable2
.
LeftJoin (Left outer join)
- Definition: A
LeftJoin
returns all rows from the left (first) file and the matched rows from the right (second) table. If there is no match,rowObj.Fill(table2)
returnsfalse
. - Behavior: Includes all rows from the left file, regardless of whether they have matches in the right file.
- Use case: Use
LeftJoin
when you want all records from the left file and only the matched records from the right file. For unmatched entries in the left file, therowObj.Fill(table2)
call will returnfalse
.
Key differences
-
Inclusion of unmatched rows:
InnerJoin
only includes rows with matches in both files.LeftJoin
includes all rows from the left file, regardless of whether they have matches in the right file. -
Result set size:
InnerJoin
can result in a smaller result set if many rows don’t have matches across the files.LeftJoin
ensures that all rows from the left file are present in the result. -
Use in data analysis:
LeftJoin
is particularly useful in data analysis and reporting where you need to maintain all records from one dataset while including related records from another dataset when available.
By understanding the differences between InnerJoin
and LeftJoin
, you can more effectively retrieve the necessary data based on the relationships and matching criteria between different ISAM files.