Analyzing the un-accounted gas causes in Indonesian household gas networks using text analytics

The development and growth process of domestic gas network in Indonesia is one of the Government's efforts towards National Energy Transformation or Zero Net Carbon Emissions by 2060. In its development, a number of challenges have been found to be addressed by the parties concerned. One of the major challenges faced in the development of jargas is the presence of unaccounted gas or gas losses, which is an unknown amount of gas, such as, for example, the value of un-acccounted gaseous gas in 2022 in a region reaches an average of -24,55%. The size of the UAG (Un-Accounted Gas) value needs to be known by the parties, so research is carried out using operational data such as a monthly report containing free text with the method of text analytics. Text analytics is used to perform Analysis to find out the factors causing the occurrence of value of UAG in Jargas. The analysis uses a number of Phyton-supported tools, including Anaconda, and Power BI. The data analysis uses 3,584 word strings, with 42 UAG cause variables, then extracted into 60 Tokens or keywords, then performed the analysis on 52 unique tokens, then focused on 19 main tokens. The results of the analysis showed that -24,55% of UAG causes are due to leakage (15,56%), third-party activity (3.52%), un-registered thieves (2,22%), gas-in activity (1,99%), and accuracy of measuring instruments (1,42%). From the results of the research, it is suggested that field data validation can be carried out on each Area, by adding keywords that have not yet entered the 60 keyword in this study. This is an open access article under the CC BY-SA license.


INTRODUCTION
Indonesia is one of the countries that has the largest natural gas reserves in the world (Dutu, 2016;Hartono et al., 2017;Hutagalung et al., 2019;Purwanto et al., 2016;Sugiyono & Adiarso, 2021).In addition to having large natural gas reserves, Indonesia also managed to rank 11 th as the largest natural gas producing country in 2018 (Erdiwansyah et al., 2021;Rahman et al., 2021).Previously, in 2000, Indonesia was ranked 6 th in the world with natural gas production reaching 70.5 million m3 / year.Especially for the Southeast Asian Region, Indonesia is one of the countries that has developed natural gas production since 1960, and began to grow since 1978 with production of 11.25 million m3 / year.Indonesia's natural gas production since the first year has been used and exported to the East Asian and Singapore markets.Natural Gas users in East Asia generally use Natural Gas from Indonesia in the form of liquefied natural gas (LNG), while Singapore is a user of Indonesian Natural Gas with a pipeline system method.
The use of Natural Gas in Indonesia began to be in demand by the domestic market starting in 1965, until experiencing a rapid increase in Natural Gas utilization in 1974 when one of Indonesia's State-Owned Enterprises synergized in the utilization of Natural Gas, finally in 2017 Natural Gas became the third most widely used primary energy in the country after petroleum and coal.For this reason, natural gas plays an important role in energy mix policy in Indonesia.In terms of regulations and policies, the Government of Indonesia aggressively continues to encourage the utilization of domestic natural gas, including through the development of natural gas infrastructure both through pipelines and non-pipelines (LPG, CNG, or LNG) to stimulate domestic industries and maintain a cleaner environment.In addition to infrastructure, the government is also intensively increasing the discovery of natural gas reserves.
The increasing development and management of natural gas in Indonesia is in line with the strategic plan of the Government of Indonesia in the success of the energy transition program.Energy transition is one of the priority programs, where nationally the movement of fossil energy will be carried out towards renewable energy or more environmentally friendly energy (Cantarero, 2020;Chapman et al., 2018).Natural gas is one of the more environmentally friendly energies compared to fossil energy (Kumar et al., 2011;Mohammad et al., 2021;Omer, 2017;Safari et al., 2019;Stambouli, 2011).Until 2040, it is estimated that there will be an increase in energy demand in Indonesia by 2.1% every year, with a target of reducing emissions by 29% by 2030.This means that in 2040 there will be a national energy demand of 10 Million Tera Joules, a significant increase from the 2020 demand of 5 Million Tera Joules.
Infrastructure development and strengthening of sustainable national energy policies will depend heavily on how to manage energy needs (Ghorashi & Maranlou, 2021;Goldthau, 2014;Kaygusuz, 2012).Energy needs in Indonesia in 2021 are still dominated by Oil and Gas.In conditions like this, Indonesia's Trade Balance will always be burdened by oil and gas import activities which cause the use of the State Budget to be not optimal in infrastructure development which ultimately causes the Balance to deficit.For this reason, through the Annex to Presidential Regulation No. 58 of 2017 in letter K. Gas Pipeline / LPG Terminal Project No. 122 the Government of Indonesia has committed to the Development of Natural Gas Networks for Households, and transform in terms of national energy supply.Energy transformation is carried out as an effort to reduce Fossil Energy Import (LPG) activities worth up to 6 million tons every year.
Efforts to reduce dependence on LNG imports were strengthened again by the Government of Indonesia Energy transformation policies, and efforts to reduce LPG import limitations on a large scale, namely from 400,000 Household Connections to 6,000,000 Household Connections as end customers by 2025, provide new challenges for business actors in the natural gas management sector.One of the challenges faced by business actors is the operational activities of natural gas distribution.Natural gas operational activities, especially gas distribution to household customers, have several challenges, including that the amount of gas distributed to end customers is not the same as the amount of gas received by business actors from gas suppliers or gas transportation service providers.Such challenges are often referred to as gas losses, or Un-Accounted and Gas Losses (UAG).
As an effort to answer the challenges in UAG control in natural gas business management, it is necessary to conduct an effort and study in UAG management, so that the amount of gas lost and / or not counted as part of the natural gas trade can be minimized, so that business actors can have the same view as the Indonesian government's program of carrying out national energy transformation.
This research was conducted with several objectives, namely obtaining an analysis method of Unaccounted Gas Household Gas Network (Jargas, Jaringan Gas) in Indonesia, finding the cause of Unaccounted Gas in the Household Network sector in Indonesia, and formulating a control and improvement program to control Un-accounted Gas in Natural Gas Distribution Companies, the Household Network sector in Indonesia.This research is expected to provide benefits for parties which include 1) To contribute to parties related to the natural gas management and distribution industry in Indonesia, both the Government, Business Entities, Consumers, Gas Suppliers, Gas Transporters, to practitioners and academics about the risks faced by business entities in natural gas distribution activities in Indonesia; 2) To contribute in the form of ideas to natural gas distribution business entities, household network sector in Indonesia in controlling Un-accounted Gas; and 3) To contribute to subsequent research, in conducting Analysis and modeling of Un-accounted Gas control as an effort to reduce risk to natural gas distribution companies.

METHOD
This research is problem solving and formulates the factors causing Un-accounted Gas based on data available as research material, namely using operational data on natural gas distribution in the household sector.Data is processed using models that are in accordance with the problems and research objectives.
The choice of Text analytics method was chosen in addition to paying attention to the form of data available to the household gas industry is still limited to data in the form of free text, or the majority of text rather than numbers.The data used in this study consisted of: 1) Routine Report Data, is data obtained from the Report of each area and / or region to the function of business managers and stakeholders.Data in the form of unstructured text according to the conditions of each area and / or region.Data is closed and managed by business entities.This data is obtained periodically with reporting periods ranging from Monthly, quarterly and yearly.An example of the data is "Quarterly Report Data on the management of Household Natural Gas Networks for the period TW I 2020, to TW II 2022.2) Report data is not routine.Non-routine reports are divided into two, namely reports received directly from customers or reporting parties through media provided by Business Entities, and reports sent when a special event occurs or requests from stakeholders.An example of a non-routine report is a customer complaint report published daily by the contact center function and / or customer management of each area, the report is then combined into one document containing time, name, location, description, and information related to research.
3) The supporting data for the analysis in this study uses secondary data obtained from the annual report of business entities published in public company management institutions in Indonesia.The annual report of business entities uses reports for 2020 and 2021 with audited status and is published publicly through the Indonesia Stock Exchange.4) The type of data in this study is in the form of unstructured data, which is in the form of free text, which does not have a standard writing pattern or writing structure, both word length, content, theme, and letter form.Operational data analysis activities for the distribution of household network natural gas in this study will consist of several stages and data distribution, both in volume, and divided according to the working area as assigned by the Government to Business Entities that have been regulated as follows.In an effort to avoid potential branding risks, the name of the work area and source of natural gas suppliers for each area in the study will be replaced using initials, and the research report uses 50 assignment area data, but the research worksheet uses 74 work areas as the scope of the research scope.

RESULTS AND DISCUSSION Test material data preparation
The research using data for the period of Quarter I of 2020 to Quarter IV of 2022 was carried out by utilizing tools that support Python such as Anaconda and other supporting tools.

Loading dan transform data
Loading and transforming data is carried out to get better test data so that test results can avoid the influence of unbound data, so that research data is completely clean before the data analysis process is carried out.Data transformation at this stage focuses more on data cleaning activities, including eliminating and / or filling in data that is empty or null / empty according to the availability and intent of the data.In addition to eliminating null data, at this stage data cleaning stages are also carried out including eliminating excessive spaces, eliminating numbers or numbers, punctuation, and reducing typos or writing errors that appear when loading data both human error and during data transmission.

Test material data mapping
After transforming the data, an initial test was carried out, namely testing the intensity of the number of words that most often appear in the test material data using Bag of word (BOW).Data mapping of test materials is carried out to determine the character of the data to be tested, so as to facilitate further data management.In mapping the data of this test material, data grouping is carried out based on free text sourced from complaints and job descriptions on the test data.The results of the mapping of the test material can be known as figure 2 Bag of Word Results of the test material data.Furthermore, by knowing the text that most often appears based on mapping test data using BOW, it is necessary to test the correlation between the source of the most appeared text data with keywords that have been formulated and also other variables related to the research objectives.The results of the relationship test between data variables and the information available in the test data can be known through Figure 3 Correlation between time period and data source area.

Figure 3. Correlation between Time Period and Data Source Area
The correlation between the time period and the data source area is needed to determine the consistency of the data and the amount of the area's contribution to the data to be analyzed, so that it can help in processing the next data, namely grouping the area with available text data sources.
Based on the area of operation, the source text data of the test material can be categorized into 4, with the test area as shown in Figure 4.The results of grouping areas and text source areas of test materials are needed to determine the distribution of text source data so that it can be material for further analysis in testing as the purpose of the study.The next stage of mapping test material data is to map data on the year of operation of areas and facilities based on data availability, this is done as input material when testing data on gas losses results on test materials.

Figure 5. Conduct Keyword Testing
By using 60 keywords that have been formulated, the results of keyword appearance testing show that 33 keywords do not appear in the research data which means that, these keywords have never been included in the operational or commercial reports in the form of text in the period 2020 to 2022, while 19 keywords appear in the research data with the largest value of "leaked" as many as 717 occurrences or equivalent to 48.94% of the total word occurrences key, with details as follows.Data with a zero occurrence rate, in general, is caused by the absence of reporting with keywords, even though data pre-processing has been carried out, so it is necessary to change other keywords with the same intent or meaning as the keyword in question.
4.2.2Keyword occurrence rate analysis a. Analysis of the Wilyah category, the level of occurrence of Keywords based on Regions which have been divided into four, namely SOR 1, SOR 2, SOR 3, and SOR 4 using Anaconda and Python tools integrated in it, can be shown by figure 7.

Figure 7. Keyword Occurrence Rate by Region
The appearance of keywords in SOR 1 is greater than SOR 2 as well as SOR 3 and SOR 4, this is dominated by leaks on the network.The verification results showed that as many as 250 leaks were caused by several follow-up words such as leaking in the clutch, caused by third parties, leaking in the connection and others.Likewise for the reporting period, which shows that TW1-2021 produced greater leaked and keyword cumulative data than any other period.
The keywords that appear most often in SOR 1 are leaked 250 times, then volume estimation, and jam meters.The pattern of keyword occurrence in SOR 1 also occurs in other regions, both SOR 2, SOR 3, and SOR 4, each of which has a leak occurrence rate, volume estimation and stuck meters on the top three keywords that appear.Here is Figure 8. Shows the spread of occurrences of region-based keywords.

Figure 8. Occurrence of Keywords by Region
The occurrence of keywords based on Figure 8 shows that the SOR 1 and SOR 4 regions have different patterns of the keywords independent meter recording and meter recording.The value of occurrence Record the independent meter in SOR 1 1 10 times, while in SOR 4 it is recorded 2 times, and the recording of meters in SOR 1 appears 2 times, while SOR 4 is 15 times.This needs to be verified and confirmed against the actual reporting conditions, considering that these two keywords are words with the same meaning and influence on the entire series of words in the form of free sentences.
Analysis of the emergence of keywords in each region is then carried out by adding accompanying words to the main keyword, in this case each main keyword that has an appearance value is juxtaposed with the second keyword which is a continuation or description of the main keyglass, such as location, cause, time, and others.Here's an example of what a keyword might appear to be juxtaposed with a second keyword by region.

Figure 9. Occurrence of Accompanying Keywords
The occurrence of the main keyword juxtaposed with the second keyword shows that SOR 3 has the largest occurrence rate of 154 times, SOR 2 and SOR 1 1 each 124 times, and SOR 4 42 times.

Area category analysis
Testing the occurrence of keywords in each Area can be known as Figure 10.The results shown by Figure 10 provide data that the Medan area is the largest contributor to the value of Un-accounted Gas determined based on the appearance of key data, which is 128 times or equivalent to an average UAG of -47.76%, including due to leaks, boortap, and damage to facilities, while the smallest area causing UAG is Tanjung West West Java with an average UAG realization of -18.34%.
To confirm and test data in areas that have a low occurrence rate, it is necessary to add test data in the form of historical data on gas consumption per customer in each reporting period, upstream and downstream pressure at each reporting point, the type of meter or measuring instrument used, historical hourly volume for analysis details, historical operating conditions, and other records such as calibration and maintenance periods performed.
The occurrence of keywords in Areas juxtaposed with accompanying keywords indicates that the area that previously ranked for the highest occurrence of the keyword is smaller than other areas, this can be due to the form and arrangement of reporting that does not include the cause of occurrence of a keyword, or reporting data in areas that previously had a high occurrence rate did not detail the reporting sentence.

Construction period category analysis and reporting
The results of the analysis as Figure 12 shows that in TW 1 of 2021, the level of keyword occurrence was greater than other periods, comulatively it can be interpreted that the emergence of keywords is dominated by the period of 2021 compared to 2020, and 2022.The appearance of keywords in TW I 2021 is inversely proportional to the number of new customers who gas-in, which is 6 times, while the number of leaks is recorded 106 times.This explains that, the increase in the amount of gas-in is not related to the number of leaks in an area or region, nor is the amount of flushing.Verification of the occurrence of the word "leak" is carried out by checking household natural gas networks in areas that have a high keyword occurrence rate, from 717 appearances, field verification is carried out on Nine Areas which are 80% of the contributors to the emergence of keywords.Next is the appearance of the keyword "leaked", accompanied by the appearance of the most accompanying words is about the location or place, part, and time.Here are the second keywords that dominate the occurrence of leaky keywords: "coupling", "joint", "nipple", and "pipe".
The distribution of the occurrence of the first keyword "leak" and the second keyword or accompanying keywords by Area can illustrate that the Blora and Mojokerto areas have a level of occurrence of leakage events on the nipples and connecting hoses that accompany the emergence of the keyword Leak, which is worth seven each, then the Bekasi area and continued Cirebon, with the following details.Based on the distribution of leaked key data accompanied by accompanying words, field verification is carried out by visiting the area where the largest keywords appear in Figure 14 Distribution of keywords accompanying the keyword "leaked".The results of field testing and verification show that there is evidence of leaks at the locations that have been reported, here are some examples of field conditions at the location of household gas network leaks.The number of leaks is equivalent to 9,312 m3 or equivalent to 1.42%.So that 14.23% of the leakage rate causes a gas distribution volume that is greater than the average use of household connection gas in general, or equivalent to 129.96 m3 / point of leakage.

Verify the occurrence of the keyword "meter"
By taking into account the volume of gas in the Medium Pressure network and Low pressure network, as well as the different types of measuring instruments between turbines and Diapragm in household customers who are still included in the threshold value of %error RT measuring instruments, the magnitude of the Effect of Stuck Meters, Damaged EVC and Inaccuracy Measuring instruments in research data can be verified by equations commonly used in the natural gas industry, namely: The equation for calculating linepack gas or gas stored in pipelines, the volume of gas out of the system or gas leaving the network and infrastructure, as well as the calculation of the consumption of each equipment used in natural gas distribution facilities, is then compared with the level of accuracy of measuring instruments as applicable in the Decree of the Director General of the Ministry of Trade of the Republic of Indonesia No. 29 of 2010 concerning Gas Meter Diagrama.
In addition, the occurrence of the second keyword in each Area which is the accompanying keyword meter, using the accompanying jam, so that a calculation will be made on one sentence consisting of a series of words meter and jam into a jam meter, will produce data as Figure 16 below.

Figure 16. Recapitulation of Stuck Meters by Area
Semarang area is the area with the highest occurrence rate of the word accompanying meter, which is 10 times the appearance of data, then the city of Jakarta Rusun 6 times, the city of Cirebon 5 times, and others each no more than 3 times appear.This shows that, in these areas have challenges in maintaining and repairing measuring instruments in the form of meters, so further verification needs to be carried out using the metering analysis method and other methods.
Verification is carried out by calculating each component so that the following household gas distribution profile is obtained.Verification of the appearance of the keyword meter is limited to the results of the appearance of text, so it cannot describe the level of accuracy of each meter, maximum or minimum volume, and other meter conditions such as stuck meters that appear 25 times or equivalent to 1.71% of keyword occurrences.

The Occurrence of the Keyword "Gas In"
This gas keyword based on test result data is 119 times or equivalent to 8.12% of the total appearance of 60 keywords in the study.The results showed that, there are several areas that have a greater occurrence rate of the keyword Gas in than other areas, namely Pelalawan, Jargas Muara Enim (Empat Petulai Dangku), OKU Timur, Tanjung Jabung Barat, Gresik, Lumajang and Wajo (Gilireng), field verification combined with supply volume, sales volume, and changes in the number of customers shows that areas that have gas in appearance rates are correct according to the results of the study, such as the example in one of the Areas.In the period before the emergence of the keyword gas in channeled volume amounted to 1,451.8572m3, then in the period of the emergence of the keyword gas in, the channeled volume became 1,557.9459m3.

CONCLUSION
The research analyzes Un-accounted Household Gas Networks (Jargas) in Indonesia using text mining method.Factors contributing to UAG include leaks, gas inflows, flushing, arrears, hoeing, anomalies, filling, taping, network damage, party work, pressure conversion, theft, gas odor, damage, and jam meter.Control and improvement programs include fixing leaks, calculating gas for new customers, preventing theft and illegal tapping, and improving conversion calculations.

Figure 1 .
Figure 1.Transform Data by Removing Null/Empty Data

Figure 2 .
Figure 2. Bag of Word Test Material Data Results

Figure 4 .
Figure 4. Results Grouping of Regions and Data Source Areas of Test Material Text

Figure 6 .
Figure 6.Keyword Test ResultsIn the test results there are 19 keywords with varying occurrence rates, while there are 34 keywords with zero occurrence rates, namely Under cotton, EVC damaged, Filling, Un-register, Unpaid, ID lost, Not registered, No BBG, Not calibrated, Wrong Conversion, Wrong Conversion, Wrong Composition, Deviation meter, Meter correction, Over cotton, Setting meter, Linepack, Wrong nomination, Venting, Blow down, Fuel Gas, Sampling, Heels, Own use, Facility damage, Illegal tapping, Gas not recorded, Unreadable, Contract expired, Volume estimation, Wrong input, Wrong forecast, Audit system.Data with a zero occurrence rate, in general, is caused by the absence of reporting with keywords, even though data pre-processing has been carried out, so it is necessary to change other keywords with the same intent or meaning as the keyword in question.4.2.2 Keyword occurrence rate analysis a. Analysis of the Wilyah category, the level of occurrence of Keywords based on Regions which have been divided into four, namely SOR 1, SOR 2, SOR 3, and SOR 4 using Anaconda and Python tools integrated in it, can be shown by figure7.

Figure 10 .
Figure 10.Analysis Results of Keyword Occurrences Per Area

Figure 11 .
Figure 11.Areas by Number of Occurrences of Accompanying Keywords

Figure 12 .
Figure 12.The Appearance of the Main Keyword in the Construction period

Figure 13 .
Figure 13.Dominant Keywords by Reporting Period Verify the occurrence of the keyword "leaked"Verification of the occurrence of the word "leak" is carried out by checking household natural gas networks in areas that have a high keyword occurrence rate, from 717 appearances, field verification is carried out on Nine Areas which are 80% of the contributors to the emergence of keywords.Next is the appearance of the keyword "leaked", accompanied by the appearance of the most accompanying words is about the location or place, part, and time.Here are the second keywords that dominate the occurrence of leaky keywords: "coupling", "joint", "nipple", and "pipe".The distribution of the occurrence of the first keyword "leak" and the second keyword or accompanying keywords by Area can illustrate that the Blora and Mojokerto areas have a level of occurrence of leakage events on the nipples and connecting hoses that accompany the emergence of the keyword Leak, which is worth seven each, then the Bekasi area and continued Cirebon, with the following details.

Figure 15 .
Figure 15.Field verification in areas where the keyword "leaked" appearsWith the verifiability of the emergence of keywords in the study area, based on the number of Household Gas customers at the end of 2021 as many as 654,975 SR, and the average Gas usage is 13m3 / SR, then 717 The number of leaks is equivalent to 9,312 m3 or equivalent to 1.42%.So that 14.23% of the leakage rate causes a gas distribution volume that is greater than the average use of household connection gas in general, or equivalent to 129.96 m3 / point of leakage.

Figure 17 .
Figure 17.Verification Results of the Occurrence of the Keyword "Meter" through Presidential Regulation No. 22 of 2017 Part IV of the National Energy Management Policy and Strategy where one of the points is Building a city gas network for 4.7 million household connections or equivalent to 0.7 million tons of LPG by 2025, and Annex II of Presidential Regulation No. 18 of 2020; The Strategic Priority Project of the National Medium-Term Development Plan 2020-2024 which is the National Medium-term development plan, namely City gas network infrastructure for 4 million house connections in 2024.

Table 1 .
Division of Assignment Areas for Research