Cybernetics and programmingПравильная ссылка на статью:
Relational database problems / Проблемы реляционных баз данных
Дата направления статьи в редакцию:11-10-2020
Аннотация: Актуальность данной статьи заключается в том, что сегодняшние базы данных являются основой множества информационных систем. Накопленная в них информация представляет собой чрезвычайно ценный материал, и сегодня широко распространены методы обработки баз данных с точки зрения извлечения из них дополнительных методов, знаний, которые взаимосвязаны с обобщением и различными дополнительными методами обработки информации.Объектом исследования в работе являются реляционные базы данных и СУБД, предметом исследования - особенности их использования в прикладном программировании.В соответствии с поставленной целью необходимо решить следующие задачи:1) рассмотреть понятие и сущность реляционной базы данных;2) проанализировать проблемные аспекты реляционных баз данных в современных условиях. Реляционные базы данных являются одними из самых распространенных, благодаря своей простоте и понятности на этапе создания и на уровне пользователя. Также следует отметить, что основным преимуществом RDB является ее совместимость с основным языком запросов SQL, который интуитивно понятен для пользователей. Тем не менее, при всем разнообразии подходов все же существуют некоторые каноны, нарушение которых сильно влияет как на дизайн базы данных, так и на ее работу. Так, например, очень актуальна проблема нормализации базы данных. Пренебрежение нормализацией делает структуру базы данных запутанной, а саму базу данных ненадежной. Перспективные направления включают разработку запросов к реляционной базе данных с использованием эвристических методов, а также метод накопления ранее оптимизированных запросов с последующей проверкой выводимости текущего запроса из накопленных. Наконец, мы, скорее всего, наблюдаем очень медленный закат реляционных баз данных. Хотя они по-прежнему являются основным средством хранения информации, особенно в крупных корпоративных проектах, они постепенно заменяются нереляционными решениями, которые со временем станут большинством.
Ключевые слова: реляционные базы данных, СУБД, кортежи, ограничения целостности данных, идентификация отношений, неопределённые значения, нормализация, денормализация, внешний ключ, основной ключ
Abstract: The relevance of this article lies in the fact that today's databases are the basis of numerous information systems. The information accumulated in them is extremely valuable material, and today database processing methods are widely spread in terms of extracting additional methods, knowledge from them, which are interconnected with generalization and various additional methods of information processing.The object of research in this work is relational databases and DBMS, the subject of research is the features of their use in applied programming.In accordance with the set goal, it is necessary to solve the following tasks:1) to consider the concept and essence of a relational database;2) to analyze the problematic aspects of relational databases in modern conditions. Relational databases are among the most widespread due to their simplicity and clarity at the creation stage and at the user level. It should also be noted that the main advantage of RDB is its compatibility with the main query language SQL, which is intuitive for users.Nevertheless, with all the variety of approaches, there are still some canons, violation of which greatly affects both the design of the database and its operation. For example, the problem of database normalization is very relevant. Neglecting normalization makes the database structure confusing and the database itself unreliable.Promising directions include the development of queries to a relational database using heuristic methods, as well as the method of accumulating previously optimized queries with subsequent verification of the derivability of the current query from the accumulated ones.Finally, a very slow decline in relational databases is probably happening. While they are still the primary storage medium, especially in large enterprise projects, they are gradually being replaced by non-relational solutions that will become the majority over time.
Keywords:undefined values, identifying relationship, data integrity constraints, tuples, DBMS, relational databases, normalization, denormalization, foreign key, primary key
1. Literature review
The leading direction in the organization of intramachine information support is the technology of banks and databases. An information database is a specialized storage of information resources in the form of an integrated set of files, which provides convenient interaction between them and quick access to data.
An information database management system (DBMS) is a set of software and language tools that provide the formation and introduction of data arrays of information. Processing and issuing the necessary information base for a group of users or management tasks is carried out using software systems for managing the information base.
The relational model was proposed by E. F. Codd in 1970  as a means of structuring data information, which is based on strict mathematical principles.
The relational model requires that the data types of the information used are simple. For a relational information data model, the type of information data used is not in itself important. The requirement that the information data type be simple should be understood so that the internal structure of the information data should not be taken into account in relational operations . Of course, actions that can be performed with information as a whole should be described, for example, data of a numeric type can be added, for strings, a concatenation operation is possible, etc.
The relational model is currently the most common in databases. Such scientists as R. Snodgrass, K. Jensen, J. Ben-Zvi, C. Gadia, E. McKenzie, A. Steiner contributed to the development of methods for storing and processing temporal data based on the relational model. Database research has led to a variety of relevant models .
Relational systems use structures (tables) for storing and working with data. Each column (attribute) contains its own type of information. Each record in the database with a unique key is transferred to a table row, and its attributes are displayed in the table columns .
Each element forming a record must satisfy a certain data type (integer, date, etc.). Different RDBMSs use different types of data that are not always interchangeable.
These kinds of limitations are common in relational databases. In fact, they form the essence of the relationship.
The relational model focuses on organizing data in the form of two-dimensional tables. Each relational table is a two-dimensional array and has the following properties:
• Each element of the table is one data element
• Each column has its own unique name
• There are no identical rows in the table
• All columns in the table are homogeneous, that is, all elements in the column are of the same type
• The order of rows and columns can be arbitrary .
Let's give an example.
Let's say you want to create a database for an internet forum. The forum has registered users who create topics and leave messages in these topics. All this information should be placed in the database.
In theory, everything can be arranged in one table, namely:
However, such an arrangement is contrary to atomicity, and in the columns "Created messages" and "Created topics" an unlimited number of values are possible. It is most expedient to split the table into three:
The Users table is now compliant. But the tables "Messages" and "Topics" - no, because there should not be 2 identical lines. In our case, the same user can write 2 identical messages:
Note also that each post must be related to a topic. To address this issue, relational databases use keys.
Primary key (PK, primary key) - a column whose values are different in all rows. RCs are logical (natural) and surrogate (artificial) .
For example, for the "Users" table, the primary key can be the e-mail column, since there are no two users with the same e-mail .
In practice, it is recommended to use surrogate keys for storing and processing data (their use will allow to abstract the RK from real data). This is important if the user suddenly changes his e-mail, but the primary keys cannot be changed .
A surrogate key is an additional field in the database. Usually this is a unique id (ordinal number of the record), although the principle may be different, the main thing is uniqueness .
We introduce primary keys into our tables:
It should be noted that each record in the table is unique. All that remains is to establish a correspondence between posts and topics using the primary keys. Add one more field to the table with messages:
Now it becomes clear that the message id = 2 belongs to the topic "About fishing" (id = 4), which was created by author 1, and the rest belong to the topic "About fishing", created by author 4 (id = 1). Such a field will be called a foreign key (FK, foreign key). In this case, each value of this field is compared with some primary key from the "Topics" table. The result is a one-to-one correspondence between topics and messages.
Let's say you add a new user named author 2.
How to find out which of the "2 authors" left a message? For this field "Author" in our tables "Messages" and "Topics" we will also make foreign keys:
So the database is ready. Schematically, it looks like this:
This small database has only 3 tables. Any relational database design begins with the development of a conceptual data model.
A conceptual model is understood as a reflection of the domain for the database being developed. Without going into theory, we are talking about a certain diagram with generally accepted designations: - things are designated by rectangles; - object attributes by ovals; - links in tables with rhombuses; - power and direction of bonds by arrows (single, double) .
A simple example is an online store. It contains goods supplied by suppliers and ordered by customers. These are three objects and two links:
When making a delivery, the supplier confirms it with documents. Likewise with the buyer. Thus, both supply and purchase can be viewed as separate entities.
Total 5 objects and 4 links. Among them: - 2 relationships of the "one to many" type (one supplier can make several deliveries; one buyer can make several purchases); - 2 relationships of the "many-to-many" type (each delivery may include several products, and the same product may be in several deliveries; a similar situation for the "Purchase - Product" line) .
But let's remember that many-to-many relationships are not allowed in relational data models, so such relationships should be changed to one-to-many relationships. We do this by adding an intermediate object:
We see that 2 more objects have appeared in the structure - "Supply Journal" and "Purchase Journal" with "one-to-many" relationships (each magazine can include several deliveries / purchases, but each delivery / purchase includes only one magazine).
Each online store object has its own attributes:
As a result, have created a conceptual model of the future database. More precisely, we are talking only about a part of the database, since we did not take into account warehouses, employees, etc. Actually, with a vast subject area, it is better to split the data into several local areas. As a rule, the volume should be within 5-7 objects. And only after the creation of local models, they are combined into a common complex scheme. In our case, we will restrict ourselves to the created model. However, now let's convert it to a relational data model .
Converting a conceptual data model to a relational data model is an important part of database design. The process includes: - building a set of preliminary tables; - indication of the RK; - performing normalization.
Our objects are composed of a set of tables, and object attributes are composed of table fields:
So, we have decided on tables, fields, RK and FK. It should be noted that in the tables "Purchase journal" and "Delivery journal" RK are composite, since they consist of 2 fields.
As for normalization, it is understood as a reversible and step-by-step process, in which the original schema is changed by another schema, in which tables are characterized by a simpler and more logical structure. This is necessary for the following reasons: 1. Elimination of data redundancy . Let's remember our table:
Obviously, in the "Themes" field, the same names appear regularly. Additional memory resources are required to store such data. In addition, when duplicating data, you can make an error when entering attribute values, as a result of which the database will go into a state of inconsistency. 2. Elimination of various anomalies related to updating, deleting, modifying, etc. An example of a modification anomaly - to change the name of a topic, we will have to look at all the lines and change the name in each of them .
Normalization happens: - 1st normal form (1NF); - 2NF; - 3NF; - NFBC (Boyce-Codd normal form); - 4NF; - 5NF.
Each form imposes certain restrictions on data at different levels. During normalization, the database becomes stricter and less susceptible to anomalies.
If we talk about relational databases, then the minimum is 1NF. However, in the design process, DBMS specialists strive to normalize the database at least to the 3NF level, thereby eliminating data redundancy and anomalies. This is important if we strive to obtain a high-quality design result. However, a detailed description of data normalization is beyond the scope of our article, so let's just see what our base will look like at the 3NF level :
So, during the design process, we converted the conceptual model to a relational one. The next stage is its implementation in a specific DBMS. This will require both the DBMS itself and knowledge of the SQL language. For example, MySQL DBMS or some other DBMS is perfect.
3. The discussion of the results
Database design is a slow and time-consuming process. During design, you need to know the subject area well, take into account all the nuances. All information should be displayed in the form of elements such as objects, attributes, links, and design is successful only when everything is done as rationally as possible .
In general, the views on design among developers can vary. Some ignore theory, guided only by experience and common sense. Others, during the design process, assign the main role to intuition, considering design to be an art that not everyone knows. Be that as it may, knowledge is never superfluous.
Yes, a relational database is nothing more than a repository where data is stored. However, the stability of the entire application where this data is used will depend on how competently you organize it.
Relational DBMS focused on the implementation of operational data processing systems are less efficient in analytical processing tasks than multidimensional databases. This is due, firstly, to the presence of rather strict restrictions imposed by the existing implementation of the SQL language. An example of such a real-life constraint is the assumption that the data in a relational database is unordered (or more accurately, randomly ordered). At the same time, their ordering requires additional time spent on sorting each time the database is accessed. In analytical systems, data entry and selection is carried out in large portions. In turn, the data, after they enter the database, remain unchanged for a long period of time. And here it is more efficient to store data in the form of partially denormalized tables, which can store not only granular values, but also pre-computed aggregates to increase performance. And for navigation and selection, specialized addressing and indexing methods based on the assumption of low variability and low mobility of data in the database can be used. This way of organizing data is sometimes called precomputed, thereby emphasizing its difference from the normalized relational approach, which involves the dynamic calculation of various types of totals (aggregation) and the establishment of relationships between attributes from different tables (join operations).
In addition to the low efficiency, which was mentioned earlier, the disadvantages of traditional relational DBMSs can be attributed to the fact that as the main and, often, the only mechanism that provides fast search and selection of individual rows in a table (or in tables linked through foreign keys), usually various modifications of B-tree indexes are used. This solution turns out to be effective only when processing small groups of records and high intensity of data modification in databases.
Relational DBMSs may never leave the scene, but the days of their reign are definitely numbered, says Paul Creel, who published an article about it in InfoWorld in September 2011. He quotes analyst Robin Blore, who argues that the architecture of relational DBMSs is morally outdated, since it was created in the past era and does not meet modern requirements.
Relational DBMSs still dominate financial transaction processing systems, but today companies are increasingly adopting the new NoSQL architecture - scale-out, distributed, and open source. Examples of such systems are Hadoop, MapReduce, and VoltDB. According to Forrester analysts, about 75% of data in enterprises is either semi-structured information (XML, email, and EDI) or unstructured (text, images, audio and video), and only 5% of this data is stored in relational databases, and the rest - in databases of other types or in the form of files, and not subject to processing by relational systems.
In the past five years, the popularity of NoSQL databases has been growing. Initially, the term appeared in the late 90s of the 20th century. This was the name of a database created by Carlo Strozzi, which stored all data in the form of files, and instead of using SQL to access the data, it used command line scripts (shell script). However, the concept of NoSQL databases, in the form in which it is now known, was formed only in 2009. The name, which originally stood for "No SQL", that is, "Without SQL", is now interpreted as "Not only SQL" - "not only SQL", and implies that SQL support is only one of the components of the system.
Unlike relational DBMS, NoSQL is based on BASE principles :
Based on this, it can be understood that in NoSQL databases, speed and flexibility are more important criteria, rather than complete data integrity.
Data storage in NoSQL databases differs depending on the types:
There are also multi-model databases that include two or more of the above categories.
From the above descriptions of the types of databases it follows that NoSQL databases are unstructured. This property of NoSQL databases allows you to make changes to the content of the database with minimal impact on the code of the project being developed. Also, the advantages of a NoSQL database include the speed of work. In addition, unlike relational databases, NoSQL databases can store data of any type (including files), while their processing speed does not decrease. . Another advantage can be considered the speed of development - it does not require a large amount of preparatory actions.
Thus, you can see that both types of databases have certain advantages and disadvantages that make each of the types more suitable for a particular project. So, for storing a large amount of unstructured data, it is the NoSQL database that is suitable, since it can provide speed, easy scalability with an increase in the amount of processed data, the ability to use cloud computing and storage and easier work with it, since serious work is not needed for narrowly targeted work. knowledge. In addition, due to the lack of structure, NoSQL databases are great for projects that expand very quickly and constantly change the stored data. At the same time, you will have to sacrifice universality, since when switching to another DBMS, both relational and non-relational, you will have to create everything practically from scratch, since different NoSQL DBMSs have different APIs; the reliability of the system and the integrity of the data, since first of all the speed of work is ensured. Therefore, NoSQL databases are suitable for Big Data and document management. They are also great for working in conjunction with SQL databases as cache storage.
The more classic and familiar option is still the RDBMS. Firstly, due to its versatility, because all relational databases use SQL, therefore, if necessary, changing the used DBMS will not entail excessive labor costs. Although relational DBMSs, with the creation of which did not imply that you would have to store large amounts of data, do not have the same speed as NoSQL databases, they have greater reliability, predictability and data integrity due to the rigid definition of how exactly transactions interact with the database. The structuredness, although it is an undoubted advantage of this type of database, can also be a disadvantage in specific projects. Relational databases can be used in any projects, although in some they are inferior to NoSQL databases.
To summarize, we can say that NoSQL databases are not going to be a full replacement for relational databases. NoSQL databases implement fundamentally different paradigms of data presentation, and are designed to work in conjunction with SQL databases or in certain areas. Don't think of NoSQL as the successor to relational databases - rather as an add-on, a useful tool in certain situations.
Thus, can summarize the following.
In conclusion, I would like to note that relational databases are one of the most common, due to their simplicity and clarity at the stage of creation and at the user level. It should also be noted that the main advantage of the RDB is its compatibility with the main query language SQL, which is intuitive for users.
Nevertheless, with all the variety of approaches, there are still some canons, the violation of which greatly affects both the design of the database and its operation. So, for example, the problem of database normalization is very relevant. Neglecting normalization makes the database structure confusing and the database itself unreliable.
Promising directions include the development of queries to a relational database using heuristic methods, as well as a method of accumulating previously optimized queries, followed by checking the derivability of the current query from the accumulated ones.
Finally, we are most likely seeing a very slow decline in relational databases. Although they are still the main storage medium, especially in large enterprise projects, they are gradually being replaced by non-relational solutions, which will become the majority over time.
Результаты процедуры рецензирования статьи
В связи с политикой двойного слепого рецензирования личность рецензента не раскрывается.
The journal allows the author(s) to hold the copyright without restrictions. All authors automatically own full copyright in their work as soon as they create it, and current Russian Federal legislation protects them.
Licence type: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
The journal is an open access journal which means that everybody can read, download, copy, distribute, print, search, or link to the full texts of these articles in accordance with Creative Commons Attribution- NonCommercial 4.0 International License.
You are free to:
Share — copy and redistribute the material in any medium or format.
Adapt — remix, transform, and build upon the material The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial — You may not use the material for commercial purposes.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.